简体   繁体   中英

How to use an optimization algorithm to find the best possible parameter

I'm trying to find a good interval of colors for color masking in order to extract skin from images.

I have a database with images and masks to extract skin from those images. here's an example of a sample:

示例图像

I'm applying the mask for each image in order to get something like this:

掩蔽样本结果

I'm getting all the pixels from all the masked images and removing the black pixels in order to keep only the pixels containing the skin. Using this method I'm able to gather different pixels containing different shades of color of different skins from different people.

This is the code I'm using for this:

for i, (img_color, img_mask) in enumerate ( zip(COLORED_IMAGES, MASKS) ) :

    # masking
    img_masked = cv2.bitwise_and(img_color, img_mask)
    
    # transforming into pixels array
    img_masked_pixels = img_masked.reshape(len(img_masked) * len(img_masked[0]), len(img_masked[0][0]))
 
    # merging all pixels from all samples
    if i == 0:
        all_pixels = img_masked_pixels
    else:
        all_pixels = np.concatenate((all_pixels, img_masked_pixels), axis = 0)

# removing black
all_pixels = all_pixels[ ~ (all_pixels == 0).all(axis = 1) ]

# sorting pixels
all_pixels = np.sort(all_pixels)

# reshape into 1 NB_PIXELSx1 image in order to create histogram
all_pixels = all_pixels.reshape(len(all_pixels), 1, 3)

# creating image NB_PIXELSx1 image containing all skin colors from dataset samples
all_pixels = cv2.cvtColor(all_pixels, cv2.COLOR_BGR2YCR_CB)

After extracting all shades of color from different skins, I'm creating a histogram that allows me to see which colors are more common. The code is too long for the creation of the histogram, but this is the result:

在此处输入图像描述

Then, I use the turning point for each color space graph and chose a distance for that color space, say 20. The interval for that color space is gotten by doing [ turning point - 20, turning point +20 ]

在此处输入图像描述

So let's say that we got the following:

R:

  • turning point: 142
  • distance: 61
  • interval: [81, 203]

G:

  • turning point: 155
  • distance: 10
  • interval: [145, 165]

B:

  • turning point: 109
  • distance: 14
  • interval: [95, 123]

I would use these intervals in order to create masks of the colored image from the dataset in order to extract the skin (left: my intervals mask, right: ground truth mask):

在此处输入图像描述

The extracted masks using my intervals are compared with the dataset preexistent masks and the accuracy is calculated in order to see how effective and good the intervals that I got are:

precision_moy = 0
accuracy_moy = 0

for i, (image, img) in enumerate ( zip(COLORED, GROUND_TRUTH) ) :
    Min = np.array([81, 145, 95], np.uint8)
    Max = np.array([203, 165, 123], np.uint8)

    mask = cv2.inRange (image, Min, Max)

    TP = 0 # True Positive
    TN = 0 # True Negative
    FP = 0 # False Positive
    FN = 0 # False Negative

    for i in range(mask.shape[0]) :
        for j in range(mask.shape[1]) :
            if mask[i,j] == 255 and img[i,j,0] == 255:
                TP = TP + 1
            if mask[i,j] == 0 and img[i,j,0] == 0:
                TN = TN+1
            if mask[i,j] == 255 and img[i,j,0] == 0:
                FP = FP+1
            if mask[i,j] == 0 and img[i,j,0] == 255:
                FN = FN+1

    precision = TP/(TP+FP)
    accuracy = (TP+TN)/(TP+TN+FP+FN)
    
    precision_moy = precision_moy + precision
    accuracy_moy = accuracy_moy + accuracy

precision_moy = precision_moy / len(COLORED)
accuracy_moy = accuracy_moy / len(COLORED)

I keep on changing the intervals, testing and calculating the accuracy, in order to find the best possible interval for each color space. This change is done by multiplying the distance by a number between 0 and 2. For example:

OLD R:

  • turning point: 142
  • distance: 61
  • interval: [81, 203]

NEW DISTANCE = OLD DISTANCE * 0.7 = 61 * 0.7 = 43

NEW R:

  • turning point: 142
  • distance: 43
  • interval: [99, 185]
  • To get a higher interval I would multiply by a number in ]1, 2]
  • To get a smaller interval I would multiply by a number in ]0, 1[

Now, to my question:

I would like to find the best possible interval for each color space using an optimization method instead of manually and randomly changing the intervals. What optimization method should I use and how would I use it?

Thank you for taking the time. Your help is appreciated.

One basic approach which converges quickly but may not yield the global optimum is Hillclimbing .

Hillclimbing is a form of local search which can be used in this case.
Hillclimbing works by going from one state or solution to the next depending on the score or performance of the state. If no better state can be found that state is returned as solution.

There are multiple ways of implementing Hillclimbing, in your case I would do something like this:

The State : In your case an item containing the Min and Max numpy arrays and the accuracy or f-measure of the mask created with these arrays applied on the image as score property.

For now I suggest you only take symmetrical ranges to massively reduce the search space.

Starting State
You can create a starting state at random, taking a random interval for each channel (Red, Green, Blue). This is especially useful if you run this algorithm multiple times. Determine the maximum and minimum for each interval based on your histograms.

Iteration Process (this is where the searching is done)
You want to create an infinite loop in which you create successor states for the current state. Increasing or decreasing the interval of each channel with say 10 of the current state, and then every combination of those new intervals can be a successor state.
Another way could be to switch channel each iteration. So in the first iteration you create a successor state that has the Red channel of the current state decreased with 10, and a successor state that has the Red channel of the current state increased with 10. The second iteration you change the Green channel, the third iteration the Blue channel, etc.

You then create a mask based on each successor state and apply them onto the image, therefore determining the performance of each successor state.
Select the best performing successor state and take it as current state if its performance is better.

Repeat this process until the best successor state is performing worse than the current state, then you know you have hit a local optimum. Return this state as solution.

Problems
As highlighted in above line, this algorithm will find the local optimum for the starting state. This is because of greediness of this algorithm.
You therefore may want to restart this algorithm on different starting locations, allowing more of the search space to be explored, increasing the chance the global maximum is found.
If you have multiple threads you may run multiple instances in parallel and then finally returning the best state out of the results from each instance.

Hillclimbing is not the best optimization algorithm, but it is very fast and easy to implement.

I would suggest using genetic optimization which can be easily implemented for as simple problem as yours. Since the problem is relatively "small" it should not take much longer to find optimal solution compared to some local optimization method like Hillclimb suggested by @Leander. Genetic algorithm is a metaheuristic search so it is not guaranteed to find the optimal solution but it should get you very close. In fact for a such small problem the chance that you will find the global optimum is very high.

As a start I would recommend taking a look at DEAP so you don't have to implement anything yourself ( https://deap.readthedocs.io/en/master/ ). It contains very good implementations of many genetic algorithm variations and there are tutorials with nice examples. With a bit of effort you should be able to compose a simple optimization algorithm in a day or two.

Genetic algorithm will from now on be denoted as GA for simplicity

Some tips where to start:

  • I suggest you start with the simplest variation eaSimple in DEAP. When this will not be satisfactory you can always move to something little more sophisticated but I think that won't be necessary.
  • your Individual in GA will have 6 components -> [blue_low, blue_high, green_low, green_high, red_low, red_high] this will also address the problem of assymetric interval as mentioned by @Leander in the comments
  • mutations will be done by randomly altering elements of the individual
  • for fittness function you can use your accuracy as you are computing it now

That is essentially all you need to build GA for your problem. This example here https://deap.readthedocs.io/en/master/examples/ga_onemax.html should get you up and running. You just need to define your own individuals, operators and fitness evaluation function as I mentioned in previous steps

A final note on the use of any general optimization method. As I understand this is a discrete problem in 6 dimensions since you have 6 components: blue_low, blue_high, green_low, green_high, red_low, red_high and each one of them has only 255 possible values. This will prevent use of most optimization methods since they require the problem to be continuous.

In your current algorithm, you are finding the Mode (ie., peak) of the colorspace data and then taking the bins (color values) symmetrically around the mode.

For a normal distribution curve, you would have the % of population based on the number of standard deviations around the mean as given below:

正态分布曲线

In a normal distribution, mean, median and mode will be the same. However, if your distribution is skewed the population on the left side of the mean wont be the same as the population on the right side of the mean. So, a simple adjustment that you can make is as follows:

Let p_left be the % of population to the left of the peak and p_right be the % of population to the right of the peak. For eg: let p_left = 40% and p_right = 60% . Instead of a fixed interval width of 40 that you are using (-20,20) , you can set another parameter which is % of selected population , say 15%. This is the total population we want around the mode (including the mode). You can then divide this 15% in the proportion of the left vs right population.

left proportion = 15% x 40% = 6%
right proportion = 15% x 60% = 9%

You should correct these 6% and 9% by calculating the mode % of population and taking out half of it from each. For eg: If the mode is 5% of the population, you should deduct 2.5% from 6% and 9%. This gives adjusted p_left and p_right as:

p_left = 6% - 2.5% = 3.5%
p_right = 9% - 2.5% = 6.5%

Instead of dividing the interval evenly around the mean, you compute how many bins from the left and right need to be included to determine the range. For eg: you may find including 5 bins on the left adds up to 3.5% of total population and adding 3 bins on the right gives you 6.5% of the population approximately.

So, your range becomes (x - 5, x + 3) where x is the x coordinate of the mode.

Parameter estimation: To determine the right % for the mode% of population (the 15% in the example above), you can compute the histograms on a standard set of your masked images and use that to determine a good initial estimate. Essentially count the unmasked pixels in your masked images and divide it by total pixels

Actually, finding the global optimum for a given dataset is not too complicated. For simplicity, let's first assume you have grayscale images since each of the colors is treated independently (I believe). It would be a bit more complicated if you were scoring a pixel based on all 3 colors falling within the required interval, but it seems like you're not.

So anyways, you can just exhaustively check each interval for each image, depending on the size of your dataset. For instance, if each pixel only takes integer values in [0,255], there are only on the order of 100 interval sizes you even need to consider. So you can compute the accuracy for each candidate interval size and each image, and simply take the interval that yields the highest average accuracy. Repeat across all colors. This is the brute force approach for sure, but unless your dataset is quite large it shouldn't be computationally expensive using optimized matrix operations. If your dataset is huge, a sufficiently large random sample of images over which to use this technique would yield an approximate (though not globally optimal solution).

As an aside, the way you are currently computing your accuracies between mask and ground truth is quite inefficient. The rule of thumb is pretty much to always use numpy matrix operations when you can because they're much more efficient (there are some cool algorithmic tricks for time saving on matrix operations and they're written in C so are faster for that reason as well.

You can replace this:

 for i in range(mask.shape[0]) :
    for j in range(mask.shape[1]) :
        if mask[i,j] == 255 and img[i,j,0] == 255:
            TP = TP + 1
        if mask[i,j] == 0 and img[i,j,0] == 0:
            TN = TN+1
        if mask[i,j] == 255 and img[i,j,0] == 0:
            FP = FP+1
        if mask[i,j] == 0 and img[i,j,0] == 255:
            FN = FN+1

With the equivalent matrix operation:

ones = np.ones(img.shape)
zeros = np.zeros(img.shape)
diff = mask - img
TP = sum(np.where(np.multiply(diff,img) == 1,ones,zeros))
TN = sum(np.where(np.multiply(diff,1-img) == 1,ones,zeros))
FP = sum(np.where(diff == -1,ones,zeros))
FN = sum(np.where(diff == 1,ones,zeros))

This will save you time especially if you use a brute-force approach like the one I suggested, but is also good practice in general

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM