简体   繁体   中英

Grabbing a body of text using regex excluding specific conditions

I'm using Python regex to grab the body of a parsed email which may contain nothing or may look something like this:

Some coherent sentence.

lalskjfa;ljkd

the other stuff


A couple of lines of email signature blah blah

blah blah blah


I want everything in that body of the email EXCLUDING the signature line opener and its contents.

I'm basically tearing out everything but that signature email in order to reformat it for reporting.

I've tried:

  • negative lookahead: \G(\A\z|.*\n*(?!_))

  • positive lookahead: \G(\A\z|.*\n*(?=_))

Neither seems to be doing the trick.

With a negative lookahead, it seems to be grabbing everything. With a positive lookahead, it seems to be grabbing nothing.

The output I'm hoping to achieve is this text:

Some coherent sentence.

lalskjfa;ljkd

the other stuff

You may use

(?s)\A.*?(?=\n_)

It matches

  • (?s) - re.DOTALL inline flag
  • \A - start of string
  • .*? - any 0+ chars, as few as possible till the first occurrence of
  • (?=\n_) - a newline followed with _ char.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM