简体   繁体   中英

Regex (Python) - Match everything before desired word

I simply want to strip my file of every character and carriage return etc before the first appearance of the string "From:".

text example -

"File name 123 file date xxxxx

other text

From: john@example.com ...."

I can't seem to just grab everything before "From:" which i thought would be a simple line but no. Any help would be greatly appreciated. Many thanks

You may try this regex,

(?s).*?From(.*)

And replace it with \\1

Explanation:

(?s) --> Enables . to match new lines
.*?From --> captures anything before first occurrence of From
(.*) --> Matches rest of the input and stores it in group 1

Demo, https://regex101.com/r/Q8eFKL/2

Use a positive lookahead:

>>> re.findall('^(.*)(?=From:)', your_text)

This will prevent it from matching patterns that don't contain "From:", and thus may not be formatted like you're expecting.

Dot(.) matches everything other than a linebreak. So my approach would be:

(.|\n|\r)*(?=From:)
  • 1st Alternative .
    • . matches any character (except for line terminators)
  • 2nd Alternative \\n
    • \\n matches a line-feed (newline) character (ASCII 10)
  • 3rd Alternative \\r
    • \\r matches a carriage return (ASCII 13)
  • Positive Lookahead (?=From:)
    • Assert that the Regex below matches From: matches the characters
      From: literally (case sensitive)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM