简体   繁体   English

Pytesseract不会从低质量图像中提取文本

[英]Pytesseract does not extract text from low-quality image

I want to extract text from image: 我想从图像中提取文字:

image 图片

I have tried using the below code to extract the text: 我尝试使用以下代码来提取文本:

from PIL import Image
import pytesseract
img = "Offers.png"
tex = pytesseract.image_to_string(Image.open(img))
string = pytesseract.image_to_string(Image.open(img), config='--psm 6')

I could not extract text. 我无法提取文字。 tex variable return an empty string, whereas the string variable returns a line of text. tex变量返回一个空字符串,而string变量返回一行文本。

What can I do to extract the complete text from the pamphlet image? 如何从小册子图像中提取完整文本?

EDIT 1: 编辑1:

Since the previously provided image was low quality, I'm now providing some random image from google images with comparatively better quality. 由于之前提供的图像质量较差,我现在提供一些谷歌图像的随机图像,质量相对较好。

new image 2 新形象2

new image 3 新形象3

Now when I try to implement the same code above to extract the text, again I'm unable to extract the complete text. 现在当我尝试实现上面相同的代码来提取文本时,我再也无法提取完整的文本。

EDIT 2: 编辑2:

img = cv2.imread('sale-banner-template-design_74379-121.jpg',0)
thesh, im_bw = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)

up_image = cv2.resize(img,None,fx=2,fy=3,interpolation=cv2.INTER_LINEAR)

t = pytesseract.image_to_string(up_image)

Removing colour, unnessary input and upscaling the image size. 删除颜色,不规则输入和放大图像大小。 This helps tesseract a significant amount. 这有助于确定相当大的数量。 You can do all of this with PIL and its various modules 您可以使用PIL及其各种模块完成所有这些工作

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM