简体   繁体   English

正则表达式(Python)-匹配所需单词之前的所有内容

[英]Regex (Python) - Match everything before desired word

I simply want to strip my file of every character and carriage return etc before the first appearance of the string "From:". 我只是想在字符串“ From:”的首次出现之前删除每个字符和回车符等文件。

text example - 文字范例-

"File name 123 file date xxxxx “文件名123文件日期xxxxx

other text 其他文字

From: john@example.com ...." 来自:john@example.com ....“

I can't seem to just grab everything before "From:" which i thought would be a simple line but no. 我似乎不能只抓住“发件人:”之前的所有内容,我认为这是一条简单的行,但没有。 Any help would be greatly appreciated. 任何帮助将不胜感激。 Many thanks 非常感谢

You may try this regex, 您可以尝试此正则表达式,

(?s).*?From(.*)

And replace it with \\1 并替换为\\ 1

Explanation: 说明:

(?s) --> Enables . to match new lines
.*?From --> captures anything before first occurrence of From
(.*) --> Matches rest of the input and stores it in group 1

Demo, https://regex101.com/r/Q8eFKL/2 演示, https://regex101.com/r/Q8eFKL/2

Use a positive lookahead: 使用积极的前瞻:

>>> re.findall('^(.*)(?=From:)', your_text)

This will prevent it from matching patterns that don't contain "From:", and thus may not be formatted like you're expecting. 这将阻止它匹配不包含“发件人:”的模式,因此其格式可能不符合您的期望。

Dot(.) matches everything other than a linebreak. Dot(。)匹配换行符以外的所有内容。 So my approach would be: 所以我的方法是:

(.|\n|\r)*(?=From:)
  • 1st Alternative . 第一选择。
    • . matches any character (except for line terminators) 匹配任何字符(行终止符除外)
  • 2nd Alternative \\n 第二种选择\\ n
    • \\n matches a line-feed (newline) character (ASCII 10) \\ n与换行符(ASCII 10)匹配
  • 3rd Alternative \\r 第三替代\\ r
    • \\r matches a carriage return (ASCII 13) \\ r匹配回车符(ASCII 13)
  • Positive Lookahead (?=From:) 正前瞻(?=来自:)
    • Assert that the Regex below matches From: matches the characters 断言以下正则表达式匹配From:匹配字符
      From: literally (case sensitive) 发件人:从字面上看(区分大小写)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM