使用 pytesseract OCR 从图像中识别文本

Question

我需要使用 Pytesseract 从这张图片中提取文本：

和代码：

from PIL import Image, ImageEnhance, ImageFilter
import pytesseract
path = 'pic.gif'
img = Image.open(path)
img = img.convert('RGBA')
pix = img.load()
for y in range(img.size[1]):
    for x in range(img.size[0]):
        if pix[x, y][0] < 102 or pix[x, y][1] < 102 or pix[x, y][2] < 102:
            pix[x, y] = (0, 0, 0, 255)
        else:
            pix[x, y] = (255, 255, 255, 255)
img.save('temp.jpg')
text = pytesseract.image_to_string(Image.open('temp.jpg'))
# os.remove('temp.jpg')
print(text)

而“temp.jpg”是

不错，但打印的结果是,2 WW不是正确的文本2HHH ，那么我该如何去除那些黑点呢？

Answer 1

这是我的解决方案：

import pytesseract
from PIL import Image, ImageEnhance, ImageFilter

im = Image.open("temp.jpg") # the second one 
im = im.filter(ImageFilter.MedianFilter())
enhancer = ImageEnhance.Contrast(im)
im = enhancer.enhance(2)
im = im.convert('1')
im.save('temp2.jpg')
text = pytesseract.image_to_string(Image.open('temp2.jpg'))
print(text)

Answer 2

要对图像执行 OCR，对图像进行预处理很重要。 这是使用 OpenCV 和 Pytesseract OCR 的简单方法。 这个想法是获得一个处理过的图像，其中要提取的文本是黑色的，背景是白色的。 为此，我们可以转换为灰度，应用轻微的高斯模糊，然后使用Otsu 阈值来获得二值图像。 从这里，我们可以应用形态学操作来去除噪声。 最后我们反转图像。 我们使用--psm 6配置选项执行文本提取以假设一个统一的文本块。 查看此处了解更多选项。

这是每个步骤的可视化：

输入图像

转换为灰度->高斯模糊->大津阈值

注意噪声的微小规格，为了去除它们，我们可以执行形态学操作

最后我们反转图像

Pytesseract OCR 的结果

2HHH

代码

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Morph open to remove noise and invert image
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
invert = 255 - opening

# Perform text extraction
data = pytesseract.image_to_string(invert, lang='eng', config='--psm 6')
print(data)

cv2.imshow('thresh', thresh)
cv2.imshow('opening', opening)
cv2.imshow('invert', invert)
cv2.waitKey()

Answer 3

我为我们的社区提供了一些不同的 pytesseract 方法。 这是我的方法

import pytesseract
from PIL import Image
text = pytesseract.image_to_string(Image.open("temp.jpg"), lang='eng',
                        config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')

print(text)

Answer 4

要直接从网络中提取文本，您可以尝试以下实现(making use of the first image) ：

import io
import requests
import pytesseract
from PIL import Image, ImageFilter, ImageEnhance

response = requests.get('https://i.stack.imgur.com/HWLay.gif')
img = Image.open(io.BytesIO(response.content))
img = img.convert('L')
img = img.filter(ImageFilter.MedianFilter())
enhancer = ImageEnhance.Contrast(img)
img = enhancer.enhance(2)
img = img.convert('1')
img.save('image.jpg')
imagetext = pytesseract.image_to_string(img)
print(imagetext)

Answer 5

这是我在某些颜色频率范围内去除噪声和任意线条的小进步。

import pytesseract
from PIL import Image, ImageEnhance, ImageFilter

im = Image.open(img)  # img is the path of the image 
im = im.convert("RGBA")
newimdata = []
datas = im.getdata()

for item in datas:
    if item[0] < 112 or item[1] < 112 or item[2] < 112:
        newimdata.append(item)
    else:
        newimdata.append((255, 255, 255))
im.putdata(newimdata)

im = im.filter(ImageFilter.MedianFilter())
enhancer = ImageEnhance.Contrast(im)
im = enhancer.enhance(2)
im = im.convert('1')
im.save('temp2.jpg')
text = pytesseract.image_to_string(Image.open('temp2.jpg'),config='-c tessedit_char_whitelist=0123456789abcdefghijklmnopqrstuvwxyz -psm 6', lang='eng')
print(text)

Answer 6

你只需要通过 cv2.resize 增大图片的大小

image = cv2.resize(image,(0,0),fx=7,fy=7)

我的图片 200x40 -> HZUBS

调整相同图片的大小 1400x300 -> A 1234 （所以，这是正确的）

然后，

retval, image = cv2.threshold(image,200,255, cv2.THRESH_BINARY)
image = cv2.GaussianBlur(image,(11,11),0)
image = cv2.medianBlur(image,9)

并更改参数以增强结果

Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR.
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
            bypassing hacks that are Tesseract-specific.

Answer 7

from PIL import Image, ImageEnhance, ImageFilter
import pytesseract
path = 'hhh.gif'
img = Image.open(path)
img = img.convert('RGBA')
pix = img.load()
for y in range(img.size[1]):
    for x in range(img.size[0]):
        if pix[x, y][0] < 102 or pix[x, y][1] < 102 or pix[x, y][2] < 102:
            pix[x, y] = (0, 0, 0, 255)
        else:
            pix[x, y] = (255, 255, 255, 255)
text = pytesseract.image_to_string(Image.open('hhh.gif'))
print(text)

使用 pytesseract OCR 从图像中识别文本

问题描述

7 个解决方案

解决方案1
35 2016-06-10 14:19:54

解决方案2
27 2020-02-11 02:54:21

解决方案3
5 2018-12-20 12:01:45

解决方案4
3 2018-12-13 22:00:18

解决方案5
2 2018-06-14 07:41:05

解决方案6
0 2019-07-28 08:22:13

解决方案7
0 2021-12-13 11:13:56

使用 pytesseract OCR 从图像中识别文本

问题描述

7 个解决方案

解决方案1 35 2016-06-10 14:19:54

解决方案2 27 2020-02-11 02:54:21

解决方案3 5 2018-12-20 12:01:45

解决方案4 3 2018-12-13 22:00:18

解决方案5 2 2018-06-14 07:41:05

解决方案6 0 2019-07-28 08:22:13

解决方案7 0 2021-12-13 11:13:56

解决方案1
35 2016-06-10 14:19:54

解决方案2
27 2020-02-11 02:54:21

解决方案3
5 2018-12-20 12:01:45

解决方案4
3 2018-12-13 22:00:18

解决方案5
2 2018-06-14 07:41:05

解决方案6
0 2019-07-28 08:22:13

解决方案7
0 2021-12-13 11:13:56