The idea: I have one screenshot and want to find all characters and numbers with its postition on this image. The easiest way is to use opencv match template and compare all characters (around 800) I have as ".png" to the screenshots.
myTemplatesPath = "C:/MyPath/Templates/"
allTemplateFiles = [os.path.join(root, name) for root, dirs, files in os.walk(myTemplatesPath) for name in files]
Templates_all = [cv2.imread(f, cv2.IMREAD_GRAYSCALE) for f in allTemplateFiles]
imgrey = cv2.cvtColor(screenshot, cv2.COLOR_BGR2GRAY)
for template in Templates_all:
results = cv2.matchTemplate(imgrey, template, cv2.TM_CCOEFF_NORMED)
results = np.where(results > 0.99)
Image:
Templates with different font sizes (just some examples):
This is working 100% fine. The only problem I have is the speed. It takes about 6s to find all positions in the image because it has to compare 800 templates with this 1 image. I would like to improve this time.
I had several ideas to improve this speed:
So I'm still searching for a good way to find the locations of the characters which is 100% reliable but faster. (I prefer idea number 3 but I'm open for every proposal)
I stumbled upon this question and I see there is no answer, so I will try to answer. Hopefully it will be useful to you or someone.
I had similar problem in the past and I used option 3. I had the problem you described of having multiple letters detected as one and I fixed that by checking first if the size of the region was in an acceptable range (all my letters/numbers had similar size) and if not I will try again to separate the letters using cv2.connectedComponents
. This should work if there are not two letters 'touching' each other.
However this required a lot of fine-tunning to make it work 100% for my use case. My problem was not only performance, though, but failure to recognize some letters even with the pngs of all letters. Since you mentioned that you can recognize the letters already, maybe you can just recognize words first and then run your code for the words. I think you can easily detect words using dilation (morphological operation) and then run your code for each detected word. This should reduce the time to an acceptable range. If all images are like the one you provided, maybe you can just divide in 9 sub-regions and run your code.
Other optimizations I had to use that might be useful are:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.