Python OCR 數獨圖像

Question

我已經搜索並找到了以下 python 代碼，但它沒有按預期返回結果。 我需要使用 ocr 來轉換數獨圖像上的數字並將其讀取為網格

import cv2
from imutils import contours
import numpy as np

# Load image, grayscale, and adaptive threshold
image = cv2.imread('Sample.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,57,5)

# Filter out all numbers and noise to isolate only boxes
cnts = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    area = cv2.contourArea(c)
    if area < 1000:
        cv2.drawContours(thresh, [c], -1, (0,0,0), -1)

# Fix horizontal and vertical lines
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,5))
thresh = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, vertical_kernel, iterations=9)
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,1))
thresh = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, horizontal_kernel, iterations=4)

# Sort by top to bottom and each row by left to right
invert = 255 - thresh
cnts = cv2.findContours(invert, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
(cnts, _) = contours.sort_contours(cnts, method="top-to-bottom")

sudoku_rows = []
row = []
for (i, c) in enumerate(cnts, 1):
    area = cv2.contourArea(c)
    if area < 50000:
        row.append(c)
        if i % 9 == 0:  
            (cnts, _) = contours.sort_contours(row, method="left-to-right")
            sudoku_rows.append(cnts)
            row = []

# Iterate through each box
for row in sudoku_rows:
    for c in row:
        mask = np.zeros(image.shape, dtype=np.uint8)
        cv2.drawContours(mask, [c], -1, (255,255,255), -1)
        result = cv2.bitwise_and(image, mask)
        result[mask==0] = 255
        cv2.imshow('result', result)
        cv2.waitKey(175)

cv2.imshow('thresh', thresh)
cv2.imshow('invert', invert)
cv2.waitKey()

我不知道如何解決這樣的問題，如果我是初學者，請原諒我。 這是圖像的示例。

Answer 1

在 CLI 方面我能做的最好的事情是通過任何轉換器將圖像運行為 PNM 格式，這對於大多數 OCR 應用程序來說是首選的，但是大多數 OCR 應用程序將轉換為純文本，這 7 個有時可能被視為 T（在這個簡化的情況下很容易查找和替換）。

更大的障礙是 OCR，就像 PDF 沒有縮進或邊距的概念，所以現在我們得到這個輸出。 並且對字符間距的任何修正都無濟於事。

因此，您的解決方案可能依賴於通過轉換為 PDF XY 位置將圖像轉換為矢量位置，然后使用 PDF OCR 嘗試從 pdf 提取結果中獲取字符布局。

Python 庫具有試圖保持表格位置的數據框解決方案，但是我不使用 python 來建議哪一個可以很好地做到這一點。

Python OCR 數獨圖像

問題描述

1 個解決方案

解決方案1
1 2022-06-21 13:59:57

Python OCR 數獨圖像

問題描述

1 個解決方案

解決方案1 1 2022-06-21 13:59:57

解決方案1
1 2022-06-21 13:59:57