-
Ab (Abutaleb)
- No method-specific parameters.
-
Be (Bernsen)
-r <region_size>
-c <contrast_limit>
-low
For each pixel, the threshold
T(x,y) = (Z_low + Z_high)/2
is used, where Z_low and Z_high are the lowest and highest
gray level pixel values in a square neighborhood of size
(region_size x region_size) about the pixel (x,y).
However, if the contrast measure
C(x,y) = Z_high - Z_low < 'contrast_limit'
the pixel is set to 255 if -low is specified, else to 0.
-
ETM (Eikvil, Taxt & Moen)
-r <region_size>
-s <small_window_size>
-l <mean_diff_limit>
-w <weight>
-mi <global_min>
-ma <global_max>
The pixels inside a small window S of size
(small_window_size x small_window_size) are thresholded on
basis of the clustering of the pixels inside a larger
concentric window L of size (region_size x region_size).
S and L slide across the image in steps equal to the side
of S. For all pixels inside L, a threshold T is
calculated to divide the pixels into two clusters. If the two
estimated cluster means mu1 and mu2 are further apart than a
limit, mu2 - mu1 > mean_diff_limit the pixels inside S are
binarized using the threshold value T. When mu2 - mu1 <
mean_diff_limit, the pixels inside S are all assigned to
the class with the closest updated mean value, being it print
or background.
bin_method is used to select the global binarization
method which finds T. Use bin_method =
OTSU or bin_method = KITTLER_ILLINGWORTH.
weight is used to calculate the updated mean
values MU1 and MU2. For each small window S, the
new values are caluculated as
MU1 = 'weight' * MU1 + (1.0-'weight') * mu1
MU2 = 'weight' * MU2 + (1.0-'weight') * mu2
weight = 1.0 means that the initial estimates
for MU1 and MU2 are kept forever, while
weight = 0.0 means that the mean values
mu1 and mu2 of the
two clusters in L are used in the next step.
For extremely difficult images, a preprocessing
step can be used on
the image. Replace all pixel values higher than the
value glob_max by
glob_max itself, and similarly, replace all
pixel values lower
than glob_min by glob_min.
-
KI (Kittler & Illingworth).
- Global method, no method-specific parameters.
-
KSW (Kapur, Sahoo & Wong).
- Global method, no method-specific parameters.
-
man (manually selected threshold).
- The image is thresholded using threshold.
-t <threshold>
-
MH (Mardia & Hainsworth)
-s
-d <diff-limit>
Mardia and Hainsworth`s method first makes an initial
binarization. Then several steps are iterated until
convergence is reached. First, the estimated mean values mu1
and mu2, and the number of pixels n_i in both foreground and
background of the current binarization are calculated. Then a
threshold T based on these values is calculated. Then, for
each pixel, a weighted mean, G, of the pixel and its eight
neighbors is calculated. If G<=T, the pixel is classified
as ``black', otherwise ``white'. The weights on the neighbor
pixels, used to calculate G, may be equal, or they may be
calculated based on a semivariogram approximation. If
use_sem_var >= 1, the semivariogram approach is used. The
last step of each iteration is to smooth the new binary image
using a (3 x 3) median filter. diff_limit is used to test if
convergence has been reached. The number of different pixels
of two successive binarizations is counted, and compared to
diff_limit.
-
Ni (Niblack)
-r <region-size>
-k <weight>
The idea of this method is to vary the threshold over the
image, based on the local mean and local standard deviation.
The threshold at pixel (x,y) is calculated as
T(x,y) = 'weight' * sigma(x,y) + mu(x,y)
where sigma(x,y) and mu(x,y) are the estimated standard
deviation and mean value in a local neighborhood of (x,y) of
size (region_size x region_size). The value of weight is
used to adjust how much of the foreground object edges that
are taken as a part of the object. A recommended value for
weight is -0.2 for images with black foreground objects, and
weight = +0.2 for images with white foreground objects.
-
NR (Chow & Kaneko / Nakagawa & Rosenfeld)
-r <region-size>
-m <mean-diff-limit>
-mi <min-ratio>
-ma <max-ratio>
-s <sdev-limit>
-vtp <vtp-limit>
The method is a slight modification of Chow and Kaneko`s
method. The result is returned in outband. This method
calculates a thresholding surface. The image is divided into
windows of size (region_size x region_size), and the
histograms for each window are tested for bimodality. Each
histogram is approximated by a mixture of two Gaussian
distributions. Let mu1 and mu2 be the estimated mean values,
and sigma1 and sigma2 be the estimated standard deviations of
the two distributions. Also, let sigma_tot be the total
standard deviation, and vtp be the valley-to-peak ratio. The
bimodality test requires the following five criteria to hold:
(1) sigma_tot > 'sdev_limit'
(2) mu2 - mu1 > 'mean_limit'
(3) sigma1/sigma2 > 'min_ratio'
(4) sigma1/sigma2 > 'min_ratio'
(5) vtp < 'valley_to_peak_limit'
For each bimodal window, a threshold is calculated based on the
mu_1, mu_2, sigma_1 and sigma_2 parameters of the mixture
distribution.
Thresholds for the other windows are calculated on basis of
the thresholds for the bimodal windows. The thresholds are
first smoothed, and then interpolated to give a threshold
surface, having a single threshold value for each image pixel.
The image is binarized using the threshold surface.
-
Ot (Otsu). Global method.
- No method-specific parameters.
-
Pa (Parker)
-r <region-size>
-k <weight>
-m <mean-limit>
-s <stdev-limit>
-abs
Parkers method should normally not be used with other
parameters than -r, if any. The other parameters
were included to manipulate with the internal tests in
the algorithm, since some of these were obviously
wrong in the original paper. More sensible values are
given as default in this program.
This method first detects edges, and then the interior
of objects between edges are filled. First, for each
pixel (x,y) in the input image Z, calculate
D(x,y) = min [Z(x,y)-Z(x_i,y_i)] ,
i=1,..,8
where (x_i,y_i), i=1,.., 8 are the 8 eight-connected
neighbors of (x,y). Thus, the negative of the gradient
in the direction of the brightest neighbor is found.
Then D is broken up in regions of size region-size
x region-size, and for
each region, the sample mean and standard deviations
are calculated. Both values are smoothed, and then
bilinearly interpolated to give two new images M and
S, originating from the mean values and standard
deviations, respectively. Then for all pixels (x,y),
if M(x,y) >= mean-limit or S(x,y) < stdev-limit,
then the pixel is regarded as part of a flat region,
and remains unlabeled. [In Parkers paper, the test is
M(x,y) >= -1 or S(x,y) >= -1, which is fulfilled by
all pixels, since S(x,y) is never negative.] Else, if
D(x,y) < M(x,y) + weight S(x,y), then (x,y) is labeled
print; else, (x,y) remains unlabeled. The resulting
binary image highlights the edges. This is followed by
pixel aggregation and region growing steps to locate
the remaining parts of the print objects.
The following default values are used: region-size =
16, mean-limit = -1.0, stdev-limit = 1.0, and
weight = -1.0.
-
TFJ (Taxt, Flynn & Jain)
-r <region-size>
-s <small-window-size>
-t <training-set-size>
Taxt, Flynn and Jain`s method divides the image into
non-overlapping windows of size (small_window_size x
small_window_size). For each window, a larger concentric
window of size (region_size x region_size) pixels. The
histogram of each of the larger windows is approximated by a
mixture of two Gaussian distributions. The parameters of the
mixture are estimated using an expectation-maximization (EM)
algorithm. In each small window, the pixels are classified
using the quadratic Bayes' classifier.
A training set of training_size pixels are picked on random,
but edge pixels are avoided in the training set. The EM
algorithm requires global start values for the estimated class
means mu1 and mu2, and estimated standard deviations sigma1
and sigma2. These are obtained by k-means clustering of the
whole input band. The EM algorithm also needs an initial
estimate of the mixing weight pi, and pi = 0.5 is used.
-
WR1 (White and Rohrer I)
-l <nof-lookahead-pixels>
In their 1983 paper, White and Rohrer proposed two very
different binarization methods. The first they called "Dynamic
threshold algorithm", which we call WR1 (White and Rohrer
method I). The second they called "Integrated function
algorithm" (WR2, (White and Rohrer method II)
The lookahead parameter gives the number of lookahead pixels
used in the biased running average that is used in deciding
the threshold at each pixel location. Good values for
lookahead are 8 or (8 + Ixsize(inband)).
-
WR2 (White & Rohrer II)
-a <activity-limit>
-ma <max-value>
-m <mean-filter-size>
-s <search-vector-length>
-sl <spot-label>
-l <min-obj-width>
-d
-single
-double
White and Rohrer`s method II (Integrated function algorithm)
uses a gradient-like operator called the activity on the
image. Pixels with activity below activity_limit are labeled
0. The other pixels are further tested. If the Laplacian
edge operator of the pixel is positive, the pixel is labeled
+, otherwise -. That way a three-level label-image is
constructed, with legal pixel values +, 0 and -. The
idea is that in a sequence of labels, edges are identified as
-+ transitions or +- transitions. Object pixels are
assumed to be + and 0 labeled pixels between a -+ and
+- pair. For each pixel to be classified, a search vector of
length search_vec_len for each search direction is centered
on the pixel. The -+ and +- must be inside the seach
vector to be recognized. By using a low value on
search_vec_len, only lines and small filled areas will be
captured.
If spot_size_limit > 1, then small +-labeled regions
having less than spot_size_limit pixels are relabeled as
0-regions.
By specifying activity_limit = 0, Kitler and Illingworth`s
method is used to set the activity threshold. An upper bound
for this value is specified by max_activity.
A lower limit on the distance between the -+ and +- for
pixels to be classified as object pixels is specified by
distance_limit. This can be used to eliminate the thinnest
lines in inband.
The first improvement suggested by Trier and Taxt [1] can be
used in bin_wr2:
(1) The input image is smoothed by a mean filter of size
(mean_size x mean_size) to remove stochastic
noise. To run the original White and Rohrer`s method,
mean_size = 1 must be used.
Variations on how pixels are classified as print or
background are selected though the mode parameter:
If mode = WR_TWO_BY_TWO, the original approach is used. Two
times two pixels are classified at a time, requiring that all
four pixels are inside either horizontal or vertical object
pixel sequences.
The reasons for classifying (2 x 2) pixels at a time in the
original implementation was partly to save computational time,
partly to reduce noise. With faster computers, a more
time-consuming algorithm may be allowed.
If mode = WR_DOUBLE_VECTORS, then the idea is that a single
pixel being part of an object pixel sequence is not enough. It
has to have support from at least one of its neighbors. Either
one of its vertical neighbors and the pixel itself have to be
inside horizontal object pixel sequences or one of its
horizontal neighbors and the pixel itself must be inside
vertical object pixel sequences to be recognized as object
pixels.
If mode = WR_SINGLE_VECTOR, only a single search vector is
used in each direction.
If diagonal >= 1, then the horizontal and vertical search
directions are complemented with two diagonal search
directions. If diagonal == 0, then only the horizontal and
vertical search directions are used.
-
IWR (Improved White & Rohrer)
-a <activity-limit>
-ma <max-value>
-m <mean-filter-size>
-sl <spot-label>
This is an improved version of WR2. Some of the switches
allowed for WR2 have been removed, since the use of search
vectors to locate print pixels in the three-level label image
is abandoned in IWR. The switches still in use have the same
meaning as for WR2.
The following approach is used. First, all "+"-marked
regions are labeled print and "-"-marked regions are
marked background. Then, for each "0"-marked region, the
number of "-" and "+" pixels that are 8-connected to the
region is counted. If the number of "+" pixels is in
majority, the "0" -marked region is labeled print,
otherwise background.
This approach prodces a lot of noise which can be seen as
spurious print components, which must be removed by using the
validation step (the -v option described earlier).
-
YB (Yanowitz & Bruckstein)
-b <beta>
-m <max-nof-iter>
-t <threshold>
A potential surface passing through the image at local maxima
of the gradient image is used as a thresholding surface. The
gradient magnitude image is computed, using for example
Sobel`s edge operators. This image is thresholded and thinned
to one-pixel wide lines, to identify edge pixels. The original
image is smoothed by a (3x3) mean filter to remove stochastic
noise. A potential surface is constructedby an iterative
interpolation scheme. The interpolation surface is made to
pass through the edge pixels. An interpolation residual R(x,y)
is calculated for each non-edge pixel, and the new pixel value
P(x,y) at iteration n is calculated as
P[n](x,y) = P[n-1](x,y) + (beta * R[n](x,y)) / 4
Yanowitz and Bruckstein recommended to use 1.0 < beta
< 2.0 for fast convergence (over-relaxation). However,
we found that by using beta > 1.0, P(x,y) started to diverge
after 5-10 iterations. So beta <= 1.0 must be used, and
beat = 1.0 works well.
Next, the image is binarized using the thresholding surface.
In the original algorithm, so-called "ghost" objects are
removed by a validation step. This validation step is
in a separate library function named remove_ghost.