[英]Get words between specific words in a Python string
I'm working on getting the words between certain words in a string.我正在努力获取字符串中某些单词之间的单词。
Find string between two substrings Referring to this article, I succeeded in catching words in the following way. 在两个子字符串之间查找字符串参考这篇文章,我通过以下方式成功捕获了单词。
s = 'asdf=5;iwantthis123jasd'
result = re.search('asdf=5;(.*)123jasd', s)
print(result.group(1))
But in the sentence below it failed.但在下面的句子中它失败了。
s = ''' <div class="prod-origin-price ">
<span class="discount-rate">
4%
</span>
<span class="origin-price">'''
result = re.search('<span class="discount-rate">(.*)</span>', s)
print(result.group(1))
I'm trying to bring '4%'.我试图带来“4%”。 Everything else succeeds, but I don't know why only this one fails.
其他一切都成功,但我不知道为什么只有这个失败。 Help
帮助
Try this (mind the white spaces and new lines)试试这个(注意空格和换行)
import re
s = ''' <div class="prod-origin-price ">
<span class="discount-rate">
4%
</span>
<span class="origin-price">'''
result = re.search('<span class="discount-rate">\s*(.*)\s*</span>', s)
print(result.group(1))
Use re.DOTALL flag for matching new lines:使用 re.DOTALL 标志匹配新行:
result = re.search('<span class="discount-rate">(.*)</span>', s, re.DOTALL)
Documentation: https://docs.python.org/3/library/re.html文档: https://docs.python.org/3/library/re.html
This is structured data, not just a string, so we can use a library like Beautiful Soup to help us simplify such tasks:这是结构化数据,而不仅仅是字符串,因此我们可以使用Beautiful Soup之类的库来帮助我们简化此类任务:
from bs4 import BeautifulSoup
s = ''' <div class="prod-origin-price ">
<span class="discount-rate">
4%
</span>
<span class="origin-price">'''
soup = BeautifulSoup(s)
value = soup.find(class_='discount-rate').get_text(strip=True)
print(value)
# Output:
4%
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.