Binarization of Document Images with Various Object Sizes
There are a lot of document image binarization techniques that try to differentiate between foreground and background but many of them fail to correctly detect all the text pixels because of degradations. In this paper, a new binarization method for document images is presented. The proposed method is based on the most commonly used binarization method: Sauvola’s, which performs relatively well on classical documents, however, three main defects remain: the window parameter of Sauvola’s formula does not fit automatically to the image content, is not robust to low contrasts, and not invariant with respect to contrast inversion. Thus for some documents, the content may not be retrieved correctly. In this paper we try to overcome one of the limitations of Sauvola’s binarization which is the Handling badly various object sizes. The well-known Chan-Vese active contour model is use in combination with the computed Sauvola’s binarization step to guarantee good quality binarization for both small and large objects inside a single document, without adjusting manually the window size to the document content. The efficiency of the proposed method is shown on several document images with various object sizes.
Document image; binarization; Sauvola’s method; Chan-Vese active contour model.