简体   繁体   English

从图像中提取文本

[英]Extracting text out of images

I am working on extracting text out of images.我正在研究从图像中提取文本。

Initially images are colored with text placed in white, On further processing the images, the text is shown in black and other pixels are white (with some noise), here is a sample:最初图像着色,文本放置为白色,在进一步处理图像时,文本显示为黑色,其他像素显示为白色(有一些噪音),这是一个示例:

Now when I try OCR using pytesseract (tesseract) on it, I still am not getting any text.现在,当我在其上使用 pytesseract (tesseract) 尝试 OCR 时,我仍然没有收到任何文本。

Is any solution possible to extract text from colored images?是否有任何解决方案可以从彩色图像中提取文本?

from PIL import Image
import pytesseract
import argparse
import cv2

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True, help="Path to the image")
args = vars(ap.parse_args())

# load the image and convert it to grayscale
image = cv2.imread(args["image"])
cv2.imshow("Original", image)

# Apply an "average" blur to the image

blurred = cv2.blur(image, (3,3))
cv2.imshow("Blurred_image", blurred)
img = Image.fromarray(blurred)
text = pytesseract.image_to_string(img, lang='eng')
print (text)
cv2.waitKey(0)

As as result i get = "Stay: in an Overwoter Bungalow $3»"结果我得到=“留在:在Overwoter Bungalow $ 3»”

What about using Contour and taking unnecessary blobs from it ?使用 Contour 并从中提取不必要的斑点怎么样? might work可能有用

Try this one -试试这个——

import os
from PIL import Image
import cv2
import pytesseract
import ftfy
import uuid

filename = 'uTGi5.png'
image = cv2.imread(os.path.join(filename))
gray = cv2.threshold(image, 200, 255, cv2.THRESH_BINARY)[1]
gray = cv2.resize(gray, (0, 0), fx=3, fy=3)
gray = cv2.medianBlur(gray, 9)
filename = str(uuid.uuid4())+".jpg"
cv2.imwrite(os.path.join(
    filename), gray)
config = ("-l eng --oem 3 --psm 11")
text = pytesseract.image_to_string(Image.open(os.path.join(
    filename)), config=config)
text = ftfy.fix_text(text)
text = ftfy.fix_encoding(text)
text = text.replace('-\n', '')
print(text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM