Pytesseract不会从低质量图像中提取文本

Question

I want to extract text from image: 我想从图像中提取文字：

I have tried using the below code to extract the text: 我尝试使用以下代码来提取文本：

from PIL import Image
import pytesseract
img = "Offers.png"
tex = pytesseract.image_to_string(Image.open(img))
string = pytesseract.image_to_string(Image.open(img), config='--psm 6')

I could not extract text. 我无法提取文字。 tex variable return an empty string, whereas the string variable returns a line of text. tex变量返回一个空字符串，而string变量返回一行文本。

What can I do to extract the complete text from the pamphlet image? 如何从小册子图像中提取完整文本？

EDIT 1: 编辑1：

Since the previously provided image was low quality, I'm now providing some random image from google images with comparatively better quality. 由于之前提供的图像质量较差，我现在提供一些谷歌图像的随机图像，质量相对较好。

new image 2 新形象2

new image 3 新形象3

Now when I try to implement the same code above to extract the text, again I'm unable to extract the complete text. 现在当我尝试实现上面相同的代码来提取文本时，我再也无法提取完整的文本。

EDIT 2: 编辑2：

img = cv2.imread('sale-banner-template-design_74379-121.jpg',0)
thesh, im_bw = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)

up_image = cv2.resize(img,None,fx=2,fy=3,interpolation=cv2.INTER_LINEAR)

t = pytesseract.image_to_string(up_image)

Answer 1

Removing colour, unnessary input and upscaling the image size. 删除颜色，不规则输入和放大图像大小。 This helps tesseract a significant amount. 这有助于确定相当大的数量。 You can do all of this with PIL and its various modules 您可以使用PIL及其各种模块完成所有这些工作

Pytesseract不会从低质量图像中提取文本

问题描述

1 个解决方案

解决方案1
0 2019-06-06 11:24:48

Pytesseract不会从低质量图像中提取文本

问题描述

1 个解决方案

解决方案1 0 2019-06-06 11:24:48

解决方案1
0 2019-06-06 11:24:48