Python将彩色图像转换为白色背景上的黑色文本以进行OCR

Question

I have an image that need to do OCR (Optical Character Recognition) to extract all data.我有一张需要进行 OCR（光学字符识别）来提取所有数据的图像。

First I want to convert color image to black text on white background in order to improve OCR accuracy.首先，我想将彩色图像转换为白色背景上的黑色文本，以提高 OCR 的准确性。

I try below code我尝试下面的代码

from PIL import Image
img = Image.open("data7.png")
img.convert("1").save("result.jpg")

it gave me below unclear image它给了我下面不清楚的图像

I expect to have this image我希望有这个图像

Then, I will use pytesseract to get a dataframe然后，我将使用 pytesseract 获取数据框

import pytesseract as tess
file = Image.open("data7.png")
text = tess.image_to_data(file,lang="eng",output_type='data.frame')
text

Finally,the dataframe I want to get like below最后，我想得到的数据框如下

Answer 1

Converting RGB image to a binary image using PIL.Image.convert resulted with an "unclear" image due to the default dithering .由于默认dithering ，使用PIL.Image.convert将 RGB 图像转换为二进制图像会导致“不清楚”图像。 In your case you do not want to dither at all:在您的情况下，您根本不想犹豫：

img.convert("1", dither=Image.Dither.NONE)

Will give you a clean conversion:会给你一个干净的转换：

You still need to figure out how to capture the text in colors, but the noise is gone once you turn off dithering.您仍然需要弄清楚如何以颜色捕获文本，但是一旦关闭抖动，噪音就消失了。

Answer 2

Here's a vanilla Pillow solution.这是香草枕头解决方案。 Just grayscaling the image gives us okay results, but the green text is too faint.只是对图像进行灰度化就可以得到很好的结果，但是绿色文本太暗了。

So, we first scale the green channel up (sure, it might clip, but that's not a problem here), then grayscale, invert and auto-contrast the image.因此，我们首先将绿色通道放大（当然，它可能会剪切，但这不是问题），然后是灰度、反转和自动对比图像。

from PIL import Image, ImageOps

img = Image.open('rqDRe.png').convert('RGB')

r, g, b = img.split()

img = Image.merge('RGB', (
    r,
    g.point(lambda i: i * 3),  # brighten green channel
    b,
))

img = ImageOps.autocontrast(ImageOps.invert(ImageOps.grayscale(img)), 5)

img.save('rqDRe_processed.png')

output输出

Answer 3

You can extract the background color by looking at the most prominent color while measuring the input image statistics with Torchvision.在使用 Torchvision 测量输入图像统计信息时，您可以通过查看最突出的颜色来提取背景颜色。

More specifically you can use torchvision.transforms.functional.to_tensor :更具体地说，您可以使用torchvision.transforms.functional.to_tensor ：

>>> img = Image.open("test.png")
>>> tensor = TF.to_tensor(img)

Extract background color:提取背景颜色：

>>> u, c = tensor.flatten(1).unique(dim=1, return_counts=True)
>>> bckg = u[:,c.argmax()]
tensor([0.1216, 0.1216, 0.1216])

Get the mask of background:获取背景掩码：

>>> mask = (tensor.permute(1,2,0) == bckg).all(dim=-1)

Convert back to PIL with torchvision.transforms.functional.to_pil_image使用torchvision.transforms.functional.to_pil_image转换回 PIL

>>> res = TF.to_pil_image(mask.float())

Then you can extract the data frame using Python tesseract :然后您可以使用Python tesseract提取数据框：

>>> text = tess.image_to_data(res, lang="eng", output_type='data.frame')

^{Using from PIL import Image}^{使用from PIL import Image} ^{and import torchvision.transforms.functional as TF}^{并将import torchvision.transforms.functional as TF}

Python将彩色图像转换为白色背景上的黑色文本以进行OCR

问题描述

3 个解决方案

解决方案1
3 2022-07-18 06:27:13

解决方案2
3 已采纳 2022-07-18 06:39:50

output输出

解决方案3
1 2022-07-18 06:21:34

Python将彩色图像转换为白色背景上的黑色文本以进行OCR

问题描述

3 个解决方案

解决方案1 3 2022-07-18 06:27:13

解决方案2 3 已采纳 2022-07-18 06:39:50

output输出

解决方案3 1 2022-07-18 06:21:34

解决方案1
3 2022-07-18 06:27:13

解决方案2
3 已采纳 2022-07-18 06:39:50

解决方案3
1 2022-07-18 06:21:34