简体   繁体   中英

Python: Creating Regular Expressions with Non-Conforming Text in the Middle of String

I'm trying to create a regular expression to captures data at the beginning and the end of a string, but not in the middle. Here's a simplified example that gets across the concept:

Player Hero wins the game on last minute goal. Score: 2. Opponent: 1. Points: 3. Player Doug loses the game. Score: 1. Opponent: 2. Points: 0 Player Hero loses the game. Score: 1. Opponent: 3. Points: 0. Player Guy wins the game. Score: 3. Opponent: 1. Points: 3. Player Hero ties the game [2ycs]. Score: 2. Opponent: 2. Points: 1. Player Jim has a tough go of it [1yc]. Score: 0. Opponent: 7. Points: 0.

What I need is a regular expression that grabs "Player Hero", but ignores the middle part of the text, and instead grabs the "Score: 2. Opponent: 1. Points: 3." data part to go along with "Player Hero" (note: I don't want the data for the other players.)

I get how to capture the beginning with:

re.compile('Player Hero')

And the end with:

re.compile('Score: \\d*\\. Opponent: \\d*\\. Points: \\d\\.')

Where I'm struggling is figuring out how to deal with the non-conforming text in the middle of the strings, so that I can essentially combine the two regular expressions above.

I believe the query you are looking for is just:

^Player Hero .+ Score: \d*\. Opponent: \d*\. Points: \d\.$

.+ will match any characters
^ will match the beginning of the line
$ will match the end of the line

You can try it out here: https://regex101.com/r/Y4FMXZ/1

Note that the 3rd occurrence is not a match because Score doesn't have a colon but I'm assuming that was a typo. Also, there is whitespace at the end of that line. If that could happen, just remove the $ .

If you are interested in capturing the numbers, just put them in a capturing group using parenthesis.

^Player Hero .+ Score: (\d*)\. Opponent:? (\d*)\. Points: (\d)\.$

You must use the multiline (?m) directive:

for item in re.findall(r"(?m)^Player Hero.*?Score:\s*(\d+)\.\s*Opponent:\s*(\d+)\.\s*Points:\s*(\d+)\.?\s*$",text):
   score,oppo,points=item
   print(f"score:{score},oppo:{oppo},points:{points}")

score:2,oppo:1,points:3
score:1,oppo:3,points:0
score:2,oppo:2,points:1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM