python-tesseract OCR：仅获取数字

Question

I'm using tesseract OCRwith python-tesseract. 我正在使用tesseract OCR with python-tesseract。 In the tesseract FAQ , regarding digits, we have: 在tesseract FAQ中，关于数字，我们有：

Use 使用

TessBaseAPI::SetVariable("tessedit_char_whitelist", "0123456789");

BEFORE calling an Init function or put this in a text file called tessdata/configs/digits: 在调用Init函数之前或将其放在名为tessdata / configs / digits的文本文件中：

tessedit_char_whitelist 0123456789

and then your command line becomes: 然后你的命令行变成：

tesseract image.tif outputbase nobatch digits

Warning: Until the old and new config variables get merged, you must have the nobatch parameter too. 警告：在旧的和新的配置变量合并之前，您还必须具有nobatch参数。

In python-tesseract, the SetVariable method exists. 在python-tesseract中，存在SetVariable方法。 I've tried this, but the result of the OCR is the same: 我试过这个，但是OCR的结果是一样的：

api = tesseract.TessBaseAPI()
api.SetVariable("tessedit_char_whitelist", "0123456789")
api.Init('.','eng',tesseract.OEM_DEFAULT)
api.SetPageSegMode(tesseract.PSM_AUTO)

Did anyone already got this working, or should I consider it a bug in python-tesseract? 有没有人已经有这个工作，或者我应该认为它是python-tesseract中的一个错误？

Answer 1

OK, got it working. 好的，搞定了。 According to this (unofficial ?) documentation of tesseract-ocr, SetVariable() must be called after Init(), even though the opposite is said in the official FAQ. 根据tesseract-ocr的这个（非官方？）文档，必须在Init（）之后调用SetVariable（），即使官方常见问题解答中说的相反。 Calling it after Init() works as intended. 在Init（）之后调用它按预期工作。

python-tesseract OCR：仅获取数字

问题描述

1 个解决方案

解决方案1
15 已采纳 2012-03-21 13:22:09

python-tesseract OCR：仅获取数字

问题描述

1 个解决方案

解决方案1 15 已采纳 2012-03-21 13:22:09

解决方案1
15 已采纳 2012-03-21 13:22:09