[英]How to remove text between two specfic words in python
我使用漂亮的湯包解析了一個 url 以獲取其文本。 我想刪除條款和條件部分中的所有文本,即“關鍵條款:……適用條款和條件”段落中的所有字詞。
以下是我嘗試過的:
import re
#"text" is part of the text contained in the url
text="Welcome to Company Key.
Key Terms; Single bets only. Any returns from the free bet will be paid
back into your account minus the free bet stake. Free bets can only be
placed at maximum odds of 5.00 (4/1). Bonus will expire midnight, Tuesday
26th February 2019. Bonus T&Cs and General T&Cs apply.
"
rex=re.compile('Key\ (.*?)T&Cs.')"""to remove words between "Key" and
"T&Cs" """
terms_and_cons=rex.findall(text)
text=re.sub("|".join(terms_and_cons)," ",text)
#I also tried: text=re.sub(terms_and_cons[0]," ",text)
print(text)
即使列表“terms_and_cons”非空,上面的內容也只是保持字符串“text”不變。 如何成功刪除“Key”和“T&Cs”之間的單詞? 請幫我。 我已經被這段所謂的簡單代碼困住了很長一段時間,它變得非常令人沮喪。 謝謝你。
您在正則表達式中缺少re.DOTALL
標志,以將換行符與點匹配。
方法 1:使用 re.sub
import re
text="""Welcome to Company Key.
Key Terms; Single bets only. Any returns from the free bet will be paid
back into your account minus the free bet stake. Free bets can only be
placed at maximum odds of 5.00 (4/1). Bonus will expire midnight, Tuesday
26th February 2019. Bonus T&Cs and General T&Cs apply.
"""
rex = re.compile("Key\s(.*)T&Cs", re.DOTALL)
text = rex.sub("Key T&Cs", text)
print(text)
方法二:使用組
將文本與組匹配並從原始文本中刪除該組的文本。
import re
text="""Welcome to Company Key.
Key Terms; Single bets only. Any returns from the free bet will be paid
back into your account minus the free bet stake. Free bets can only be
placed at maximum odds of 5.00 (4/1). Bonus will expire midnight, Tuesday
26th February 2019. Bonus T&Cs and General T&Cs apply.
"""
rex = re.compile("Key\s(.*)T&Cs", re.DOTALL)
matches = re.search(rex, text)
text = text.replace(matches.group(1), "")
print(text)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.