繁体   English   中英

如何在python中删除两个特定单词之间的文本

[英]How to remove text between two specfic words in python

我使用漂亮的汤包解析了一个 url 以获取其文本。 我想删除条款和条件部分中的所有文本,即“关键条款:……适用条款和条件”段落中的所有字词。

以下是我尝试过的:

import re

#"text" is part of the text contained in the url
text="Welcome to Company Key.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
Key Terms; Single bets only. Any returns from the free bet will be paid 
back into your account minus the free bet stake. Free bets can only be 
placed at maximum odds of 5.00 (4/1). Bonus will expire midnight, Tuesday 
26th February 2019. Bonus T&Cs and General T&Cs apply.                                                                                                                                                                                                                                                    
"
rex=re.compile('Key\ (.*?)T&Cs.')"""to remove words between "Key" and 
"T&Cs" """
terms_and_cons=rex.findall(text)
text=re.sub("|".join(terms_and_cons)," ",text)
#I also tried: text=re.sub(terms_and_cons[0]," ",text)
print(text)

即使列表“terms_and_cons”非空,上面的内容也只是保持字符串“text”不变。 如何成功删除“Key”和“T&Cs”之间的单词? 请帮我。 我已经被这段所谓的简单代码困住了很长一段时间,它变得非常令人沮丧。 谢谢你。

您在正则表达式中缺少re.DOTALL标志,以将换行符与点匹配。

方法 1:使用 re.sub

import re

text="""Welcome to Company Key.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
Key Terms; Single bets only. Any returns from the free bet will be paid 
back into your account minus the free bet stake. Free bets can only be 
placed at maximum odds of 5.00 (4/1). Bonus will expire midnight, Tuesday 
26th February 2019. Bonus T&Cs and General T&Cs apply.                                                                                                                                                                                                                                                    
"""

rex = re.compile("Key\s(.*)T&Cs", re.DOTALL)
text = rex.sub("Key T&Cs", text)
print(text)

方法二:使用组

将文本与组匹配并从原始文本中删除该组的文本。

import re

text="""Welcome to Company Key.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
Key Terms; Single bets only. Any returns from the free bet will be paid 
back into your account minus the free bet stake. Free bets can only be 
placed at maximum odds of 5.00 (4/1). Bonus will expire midnight, Tuesday 
26th February 2019. Bonus T&Cs and General T&Cs apply.                                                                                                                                                                                                                                                    
"""

rex = re.compile("Key\s(.*)T&Cs", re.DOTALL)
matches = re.search(rex, text)
text = text.replace(matches.group(1), "")
print(text)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM