简体繁体中英

Which measure indicates a smooth variation of data?

原文 2011-05-02 04:09:29 9 2 math/ image-processing/ opencv/ statistics

I am trying to compare text and non-text regions based on the thickness of lines/strokes. Using the distance transform and some fiddling thereafter, managed to obtain the thickness (actually half the thickness) of each stroke comprising the features in a picture.

Here's a typical result of a program run:

1.Text region

34444433343554335533553555545544455445533444444344455435553335545556665444445654444444444444444444444444455434554554455444456544444445555445555543355556665544665444535444553354434553444444444444455444445544444454444444444444444444444444455442444444554444444544444444444444554444456444554414454444444444444444444554444445543454445443444544434443344443334442133223332221

Non-text

So is there any statistical measure more sophisticated than standard deviation that will indicate the difference in the two datasets: one varies gradually while the second one has drastic variations? (included the scary numbers to illustrate what I'm attempting to quantize!)

Also please note that the number of data points will not be same, as i'll be comparing different regions with some experimentally determined threshold of SD (or some other measure), not regions among themselves.

2 answers

If you are interested in measuring the smoothness, the standard deviation of the differences between adjacent thicknesses should be much smaller for text than non-text.

You can thus simply convert

34444433343554335533553555545544455445533444444344455435553335545556665444445654444444444444444444444444455434554554455444456544444445555445555543355556665544665444535444553354434553444444444444455444445544444454444444444444444444444444455442444444554444444544444444444444554444456444554414454444444444444444444554444445543454445443444544434443344443334442133223332221

into

1000(-1)000…

(1 = 4-3, 0 = 4-4, etc.). The standard deviation of this list of differences is small, for text regions (in your example, this list contains many zeros).

If you need to keep using numbers between 0 and 9 for the thickness difference between thickness t1 and thickness t2 , you can perform a rescaling: round((t2-t1+9)/2) .

The thought that comes to my mind is that you could do a wavelet transform on a chunk then look at the average energy associated with high frequency wavelets.

If you're not familiar with wavelets, the simplest to describe is the Haar wavelet . Assuming that the number of points you have sampled is 2 ⁿ , you can calculate that as follows:

Divide your data into pairs of points.
Take 1/2 of the difference. That is the coefficient of the detail wavelet.
Take the average of each pair. This gives you 2 ^n-1 points. Recursively do a wavelet transform on those.

For each level of the Haar wavelet, take the average of the square of the coefficient. If your data really looks like what you've described, this statistic for the first few levels will be very different. Experiment, decide where your threshold is, and you'll probably have a pretty reliable test. (I would recommend having 3 possible answers from your test, "Text", "Not text", "unclear". Look at the "unclear" examples and then improve your test.)

Create a piecewise smooth function which preserves local integrals from data

To which Knapsack-problem variation does this problem correspond?

How to make data points fit a curve with random variation from main curve?

a variation of shortest path algorithm

Is the variation of Knapsack NP-Complete?

Bin Packing algorithm - Practical Variation

Drawing a smooth curve in SFML

randomized smooth slope in python

CatmullRomSplines and other smooth paths

Making smooth orbits in pygame

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Create a piecewise smooth function which preserves local integrals from data To which Knapsack-problem variation does this problem correspond? How to make data points fit a curve with random variation from main curve? a variation of shortest path algorithm Is the variation of Knapsack NP-Complete? Bin Packing algorithm - Practical Variation Drawing a smooth curve in SFML randomized smooth slope in python CatmullRomSplines and other smooth paths Making smooth orbits in pygame

Related Tags

Which measure indicates a smooth variation of data?

Question

2 answers

solution1
3 2011-05-02 08:02:40

solution2
2 ACCPTED 2011-05-02 07:58:32

Which measure indicates a smooth variation of data?

Question

2 answers

solution1 3 2011-05-02 08:02:40

solution2 2 ACCPTED 2011-05-02 07:58:32

solution1
3 2011-05-02 08:02:40

solution2
2 ACCPTED 2011-05-02 07:58:32