简体繁体中英

Black & white image document clustering

原文 2017-11-23 19:51:45 2 1 python/ opencv/ machine-learning/ computer-vision/ cluster-analysis

I have some black & white documents (image scan) and want to cluster them according to their layout . To make thing more concrete, say I have the following three images and first two would more likely fall into the same cluster as opposed to the 3rd image, because the first two have relatively similar layout.

My question is, what would be the best approach to clustering the documents? Right now I have a couple of initial approaches:

get image hash and compare the hash
using PCA and some clustering techniques (K-means) to compare the lower-dimension representation
extract string using OCR, extract text features and compare them
extract string using OCR and do some keyword search

Would there be other better approaches? Again, only the layout matters.

1 answers

Don't attempt to cluster raw data.

Clustering is unsupervised, it can't learn what properties are important and what not. To a clustering algorithm, everything is important.

Instead, define layout relevant features first. Such as long edges.

Convert Current Image to Black and White

Converting an OpenCV Image to Black and White

Convert RGB image to black and white

python black and white image detection

Python PIL Detect if an image is completely black or white

Copy image changing black pixels into white pixels

Pillow black and white image from binary values

Checking if image is mostly black and white or color

Matplotlib shows black & white image as gray

Color black holes to white in binary image

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Convert Current Image to Black and White Converting an OpenCV Image to Black and White Convert RGB image to black and white python black and white image detection Python PIL Detect if an image is completely black or white Copy image changing black pixels into white pixels Pillow black and white image from binary values Checking if image is mostly black and white or color Matplotlib shows black & white image as gray Color black holes to white in binary image

Related Tags

Black & white image document clustering

Question

1 answers

solution1 1 2017-11-24 00:55:33

solution1
1 2017-11-24 00:55:33