简体   繁体   中英

Printing a single line in a multi-line string

I manage to use pytesseract to convert an invoice image into text.

The multi-line string looks like this:

Receipt No: 20191220.001
Date: 20 December 2019
Invoice amount: $400.00

I would like to extract invoice number, just the number (ie: 20191220.001) using substring. I manage to get the start index through index = string.find('Receipt No: ') but when I use the substring function to extract the number print(string[index:]) I got the following result:

20191220.001
Date: 20 December 2019
Invoice amount: $400.00

But I only wanted to extract the first line. The invoice numbers are not defined at only 12 characters, there might be more or less depending on the vendor. How do I only extract the invoice number? I'm doing this to automate an accounting process.

You can use split :

s = '''Receipt No: 20191220.001
Date: 20 December 2019
Invoice amount: $400.00'''

number = s.split('Receipt No: ')[1].split('\n')[0]
print(number)

Output:

20191220.001

Or if you want to use find , you can do in this way:

index1 = s.find(':')
index2 = s.find('\n')
print(s[index1+1:index2].strip())

Try:

import re
s = """
Receipt No: 20191220.001
Date: 20 December 2019
Invoice amount: $400.00"""
p = re.compile("Receipt No\: (\d+.\d+)")
result = p.search(s)
index = result.group(1) #'20191220.001'

Separate your string in a list with split by "\\n" You will get each part of a string separated by newline as a list element. You can then take the part you want

string = """Receipt No: 20191220.001
Date: 20 December 2019
Invoice amount: $400.00"""

your_list = string.split("\n")
data = your_list[0]

You may try with split function.

with open("filename",'r') as dataload:

for i in dataload.readlines():

    if "Receipt No:" in i:

        print(i.split(":")[1].strip())

output-

20191220.001

if "Receipt No:" in i: ---> you can change if "**" parameter as per your requirement

If you only care about the first line, you can find the first occurence of line ending character as the end of your number. Notice that the start of your number is the end of the substring ("Receipt No: ") while find function return the start of the substring.

string = '''Receipt No: 20191220.001
Date: 20 December 2019
Invoice amount: $400.00'''
sub = 'Receipt No: '
start = string.find(sub) + len(sub)
end = string.find('\n')
print(string[start:end])

If you also care about other lines. You can use split and process each line separately.

lines = string.split('\n')
sub = 'Receipt No: '
index = lines[0].find(sub) + len(sub)
print(lines[0][index:])
# Process line 1
# Process line 2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM