简体   繁体   English

如何在 bash 脚本中检查 tesseract 的 output?

[英]How do I check for output of tesseract in bash script?

I am running a loop in bash script and passing png files to tesseract to read the text of image files.我在 bash 脚本中运行一个循环,并将 png 文件传递给 tesseract 以读取图像文件的文本。 If output of the tesseract ocr shows Empty page!!如果tesseract ocr的output显示Empty page!! or nothing then I want the loop to proceed to next image.或者什么都没有,然后我希望循环继续到下一个图像。 If it does include text then I want to store the output in a text file.如果它确实包含文本,那么我想将 output 存储在文本文件中。

This is what my basic script looks like,这就是我的基本脚本的样子,

for i in {1..100}
do
tesseract file-${i}.png stdout >> result.txt
done 

This is roughly what you need.这大致就是你需要的。 I took the liberty to do an "ls" to list png files in a directory, rather than iterating from 1 to 100:我冒昧地做了一个“ls”来列出目录中的 png 文件,而不是从 1 迭代到 100:

for file in /my/directory/*.png
do
  # Redirect output to a variable. This works even if output is multiline.
  output="$(tesseract "$file" stdout)"
  
  if [ -n "$output" ] && [ "$output" != "Empty page!!" ]
  then
    echo "$output" >> result.txt
  fi
done 

This is a bit rough, you may want to check result codes from tesseract in case there are errors, or you may want to omit standard error messages, things like that.这有点粗糙,您可能想检查 tesseract 的结果代码以防出现错误,或者您可能想省略标准错误消息,诸如此类。 But this should give you an idea.但这应该给你一个想法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM