简体   繁体   中英

How to cluster a set of image files to different folders based on their content

I have a set of images in a folder, where each image either has a square shape or a triangle shape on a white background (like this and this ). I would like to separate those images into different folders (note that I don't care about detecting whether image is a square/triangle etc. I just want to separate those two).

I am planning to use more complex shapes in the future (eg pentagons, or other non-geometric shapes) so I am looking for an unsupervised approach. But the main task will always be clustering a set of images into different folders.

What is the easiest/best way to do it? I looked at image clustering algorithms, but they do clustering of colors/shapes inside the image. In my case I simply want to separate those image files based on the shapes that have.

Any pointers/help is appreciated.

You can follow this method:

1. Create a look-up tables with shape you are using in the images
2. Do template matching on the images stored in a single folder
3. According to the result of template matching just store them in different folders
4. You can create folders beforehand and just replace the strings in program according to the usage.

I hope this helps

It's really going to depend on what your data set looks like (eg, what your shape images look like), and how robust you want your solution to be. The tricky part is going to be extracting features from each shape image the produce a clustering result that you're satisfied with. A few ideas:

You could compute SIFT features for each images and then cluster the images based on those features: http://en.wikipedia.org/wiki/Scale-invariant_feature_transform

If you don't want to go the SIFT route, you could try something like HOG : http://en.wikipedia.org/wiki/Histogram_of_oriented_gradients

A somewhat more naive approach - If the shapes are always the same scale, and the background color is fixed you could get rid of the background cluster the images based on shape area (eg, number of pixels taken up by the shape).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM