简体   繁体   中英

Python get N characters from string with specific substring

I have a very long string that I extracted from a image file. The string can look like this

...\n\nDate: 01.01.2022\n\nArticle-no: 123456789\n\nArticle description: asdfqwer 1234...\n...

How do I extract just the 10 characters after the substring "Article-no:" ?

I tried solving it with a different approach using rfind like this but it tends to fail every now and then if the start and end string is not accurate.

    s = "... string shown above ..."
    start = "Article-no: "
    end = "Article description: "
    print(s[s.find(start)+len(start):s.rfind(end)])

you can use split :

string.split("Article-no: ", 1)[1][0:10]

For this, a regular expression might come in very handy.

import re

# Create a pattern which matches "Article-no: " literally,
# and then grabs the digits that follow.
pattern = re.compile(r"Article-no: (\d+)")
s = "...\n\nDate: 01.01.2022\n\nArticle-no: 123456789\n\nArticle description: asdfqwer 1234...\n..."

match = pattern.search(s)
if match:
    print(match.group(1))

This outputs:

123456789

The regular expression used is Article-no: (\d+) , which has the following parts:

Article-no:      # Match this text literally
(                # Open a new group (i.e. group 1)
\d+              # Match 1 or more occurrences of a digit
)                # Close group 1

The re module will search the string for places where this matches, and then you can extract the digit from the matches.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM