热获取 CSV 文件中的输出？（Python 和 OCR）

Question

该代码在扫描文件中搜索特定关键字并输出其后的单词，但仅在控制台中。 现在我的问题是我想将这些东西以 CSV 格式输出。 有人可以帮助我了解如何以 CSV 格式获取输出吗？

pytesseract.pytesseract.tesseract = "D:\\Users\\Dekt\\tesseract.exe"

Data = 'D:\\Users\\files\\example.pdf'
doc = convert_from_path(Data)
path, fileName = os.path.split(Data)
fileBaseName, fileExtension = os.path.splitext(fileName)

for page_number, page_data in enumerate(doc):
    txt = pytesseract.image_to_string(page_data, lang='deu').encode('utf-8')
    txt = txt.decode('utf-8')
    tokens = txt.split()

    if "Name" in tokens:
        location = tokens.index('Name')
        print("Name: " + (tokens[location + 1]) + " " + (tokens[location + 2]) + " " + (
        tokens[location + 3]))
´´´

Answer 1

不要使用print() ，而是以写入模式打开文件并将内容写入文件。

from pdf2image import convert_from_path
import os
import pytesseract
from PIL import Image

output = open("myCSV.csv", "w")

pytesseract.pytesseract.tesseract_cmd = "D:\\Users\\Dekt\\tesseract.exe"

filePath = 'D:\\Users\\files\\example.pdf'
doc = convert_from_path(filePath)
path, fileName = os.path.split(filePath)
fileBaseName, fileExtension = os.path.splitext(fileName)

for page_number, page_data in enumerate(doc):
    txt = pytesseract.image_to_string(page_data, lang='deu').encode('utf-8')
    txt = txt.decode('utf-8')
    tokens = txt.split()

    if "Name" in tokens:
        location = tokens.index('Name')
        output.write("Name: " + (tokens[location + 1]) + " " + (tokens[location + 2]) + " " + (
        tokens[location + 3]) + ",")

    if "Date" in tokens:
       location = tokens.index('Date')
       output.write("Date is : "+(tokens[location+1])+" "+(tokens[location+2])+" "+(tokens[location+3]) + ",")

    if "Adress" in tokens:
       location = tokens.index('Adress')
       output.write("Adress is : "+(tokens[location+1])+" "+(tokens[location+2])+" "+(tokens[location+3]) + ",")

我在每条语句的末尾添加了一个逗号，因为我不知道您在格式中到底要查找什么。

热获取 CSV 文件中的输出？（Python 和 OCR）

问题描述

1 个解决方案

解决方案1
0 2021-11-10 17:51:11

热获取 CSV 文件中的输出？ （Python 和 OCR）

问题描述

1 个解决方案

解决方案1 0 2021-11-10 17:51:11

热获取 CSV 文件中的输出？（Python 和 OCR）

解决方案1
0 2021-11-10 17:51:11