简体   繁体   English

分割图像中的连接字符

[英]Segmenting connected characters in an image

My aim is to correctly segment the characters in an image 我的目的是正确分割图像中的字符

My image looks like this: 我的图像如下所示: 在此处输入图片说明

How can I correctly segment the connected B and W? 如何正确分割连接的B和W? Also my code seems to find that 750 are all connected as well. 我的代码似乎也发现750也都已连接。 How do I segment them? 如何细分?

Which transformation must I apply? 我必须应用哪种转换? I tried erode but it did not help? 我尝试了腐蚀,但没有帮助? How is the kernel size selected for such an image? 如何为此类映像选择内核大小? How should I remove the noise on 5 and M? 如何消除5和M上的噪音?

What changes should I make to my code to correctly segment and isolate every character? 我应该对代码进行哪些更改以正确地分割和隔离每个字符? Code: 码:

img = cv2.imread('C:\\xx\\testimages\\X\\plate4.jpg', 0)
cv2.threshold(img,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU,img)
image, contours, hier = cv2.findContours(img, cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NONE)
contours = sorted(contours, key=lambda ctr: cv2.boundingRect(ctr)[0])
d=0
for ctr in contours:
    # Get bounding box
        x, y, w, h = cv2.boundingRect(ctr)
    # Getting ROI
        if w>20 and h>20 and w<60:       #Boundary conditions to isolate a character
            print(x, y, w, h)
            roi = image[y:y+h, x:x+w]
            #roi=cv2.resize(roi,(20,35))
            #kernel = np.ones((3,3), np.uint8)
            #roi = cv2.morphologyEx(roi, cv2.MORPH_CLOSE, kernel)
            #roi = cv2.erode(roi, kernel, iterations=1)
            # kernel_1=np.ones((1,1),np.uint8)
            # roi=cv2.dilate(roi,kernel,iterations=1)
            cv2.imshow('character: %d' % d, roi)
            cv2.imwrite('C:\\xx\\ValidationSet\\character_%d.png'%d, roi)
            cv2.waitKey(0)
            cv2.destroyAllWindows()
            d+=1

As people recommended in the comments already, the best options is to use morphological transforms, like erosion and opening. 正如人们已经在评论中所建议的那样,最好的选择是使用形态转换,例如侵蚀和开放。 Regarding the kernel size, you can make it a function of the width of your contours, or just iterate multiple times applying a small kernel of size (3, 3) or (5,5). 关于内核大小,您可以使其成为轮廓宽度的函数,或者仅使用大小为(3,3)或(5,5)的小内核进行多次迭代。 I personally found this tutorial (in C++, but the concept is the same) simple and useful. 我个人发现本教程 (使用C ++,但概念相同)简单而有用。 Regarding noise removal from the "5" and "M", you can't do that in a generic way that will generalize to all possible types of noise. 关于从“ 5”和“ M”中去除噪声,您不能以将所有可能类型的噪声归一化的通用方式进行去除。 If you have more information on the statistics of the errors or some more information (eg the "noise" always manifest in the form of white pixels on top of the current character) it is easier and doable, of course. 如果您有更多有关错误统计信息的信息或更多信息(例如,“噪声”始终以当前字符上方的白色像素形式出现),那么它当然更容易实现。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM