简体   繁体   English

在多行字符串中打印一行

[英]Printing a single line in a multi-line string

I manage to use pytesseract to convert an invoice image into text.我设法使用 pytesseract 将发票图像转换为文本。

The multi-line string looks like this:多行字符串如下所示:

Receipt No: 20191220.001
Date: 20 December 2019
Invoice amount: $400.00

I would like to extract invoice number, just the number (ie: 20191220.001) using substring.我想提取发票编号,只是使用子字符串的编号(即:20191220.001)。 I manage to get the start index through index = string.find('Receipt No: ') but when I use the substring function to extract the number print(string[index:]) I got the following result:我设法通过index = string.find('Receipt No: ')获取起始索引,但是当我使用 substring 函数提取数字print(string[index:])我得到以下结果:

20191220.001
Date: 20 December 2019
Invoice amount: $400.00

But I only wanted to extract the first line.但我只想提取第一行。 The invoice numbers are not defined at only 12 characters, there might be more or less depending on the vendor.发票编号并非仅定义为 12 个字符,根据供应商的不同,可能会有更多或更少的字符。 How do I only extract the invoice number?如何仅提取发票编号? I'm doing this to automate an accounting process.我这样做是为了自动化会计流程。

You can use split :您可以使用split

s = '''Receipt No: 20191220.001
Date: 20 December 2019
Invoice amount: $400.00'''

number = s.split('Receipt No: ')[1].split('\n')[0]
print(number)

Output:输出:

20191220.001

Or if you want to use find , you can do in this way:或者如果你想使用find ,你可以这样做:

index1 = s.find(':')
index2 = s.find('\n')
print(s[index1+1:index2].strip())

Try:尝试:

import re
s = """
Receipt No: 20191220.001
Date: 20 December 2019
Invoice amount: $400.00"""
p = re.compile("Receipt No\: (\d+.\d+)")
result = p.search(s)
index = result.group(1) #'20191220.001'

Separate your string in a list with split by "\\n" You will get each part of a string separated by newline as a list element.使用“\\n”分割将列表中的字符串分隔开 您将获得由换行符分隔的字符串的每个部分作为列表元素。 You can then take the part you want然后你可以选择你想要的部分

string = """Receipt No: 20191220.001
Date: 20 December 2019
Invoice amount: $400.00"""

your_list = string.split("\n")
data = your_list[0]

You may try with split function.您可以尝试使用拆分功能。

with open("filename",'r') as dataload:使用 open("filename",'r') 作为数据加载:

for i in dataload.readlines():

    if "Receipt No:" in i:

        print(i.split(":")[1].strip())

output-输出-

20191220.001 20191220.001

if "Receipt No:" in i: ---> you can change if "**" parameter as per your requirement if "Receipt No:" in i: ---> 您可以根据您的要求更改 if "**" 参数

If you only care about the first line, you can find the first occurence of line ending character as the end of your number.如果您只关心第一行,您可以找到第一个出现的行结束字符作为您号码的结尾。 Notice that the start of your number is the end of the substring ("Receipt No: ") while find function return the start of the substring.请注意,您的号码的开头是子字符串的结尾(“收据编号:”),而 find 函数返回子字符串的开头。

string = '''Receipt No: 20191220.001
Date: 20 December 2019
Invoice amount: $400.00'''
sub = 'Receipt No: '
start = string.find(sub) + len(sub)
end = string.find('\n')
print(string[start:end])

If you also care about other lines.如果您还关心其他线路。 You can use split and process each line separately.您可以使用 split 并分别处理每一行。

lines = string.split('\n')
sub = 'Receipt No: '
index = lines[0].find(sub) + len(sub)
print(lines[0][index:])
# Process line 1
# Process line 2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM