繁体   English   中英

热获取 CSV 文件中的输出? (Python 和 OCR)

[英]Hot get the output in a CSV-File? (Python and OCR)

该代码在扫描文件中搜索特定关键字并输出其后的单词,但仅在控制台中。 现在我的问题是我想将这些东西以 CSV 格式输出。 有人可以帮助我了解如何以 CSV 格式获取输出吗?

pytesseract.pytesseract.tesseract = "D:\\Users\\Dekt\\tesseract.exe"

Data = 'D:\\Users\\files\\example.pdf'
doc = convert_from_path(Data)
path, fileName = os.path.split(Data)
fileBaseName, fileExtension = os.path.splitext(fileName)

for page_number, page_data in enumerate(doc):
    txt = pytesseract.image_to_string(page_data, lang='deu').encode('utf-8')
    txt = txt.decode('utf-8')
    tokens = txt.split()

    if "Name" in tokens:
        location = tokens.index('Name')
        print("Name: " + (tokens[location + 1]) + " " + (tokens[location + 2]) + " " + (
        tokens[location + 3]))
´´´

不要使用print() ,而是以写入模式打开文件并将内容写入文件。

from pdf2image import convert_from_path
import os
import pytesseract
from PIL import Image

output = open("myCSV.csv", "w")

pytesseract.pytesseract.tesseract_cmd = "D:\\Users\\Dekt\\tesseract.exe"

filePath = 'D:\\Users\\files\\example.pdf'
doc = convert_from_path(filePath)
path, fileName = os.path.split(filePath)
fileBaseName, fileExtension = os.path.splitext(fileName)

for page_number, page_data in enumerate(doc):
    txt = pytesseract.image_to_string(page_data, lang='deu').encode('utf-8')
    txt = txt.decode('utf-8')
    tokens = txt.split()

    if "Name" in tokens:
        location = tokens.index('Name')
        output.write("Name: " + (tokens[location + 1]) + " " + (tokens[location + 2]) + " " + (
        tokens[location + 3]) + ",")

    if "Date" in tokens:
       location = tokens.index('Date')
       output.write("Date is : "+(tokens[location+1])+" "+(tokens[location+2])+" "+(tokens[location+3]) + ",")

    if "Adress" in tokens:
       location = tokens.index('Adress')
       output.write("Adress is : "+(tokens[location+1])+" "+(tokens[location+2])+" "+(tokens[location+3]) + ",")

我在每条语句的末尾添加了一个逗号,因为我不知道您在格式中到底要查找什么。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM