简体   繁体   English

获取 Python 字符串中特定单词之间的单词

[英]Get words between specific words in a Python string

I'm working on getting the words between certain words in a string.我正在努力获取字符串中某些单词之间的单词。

Find string between two substrings Referring to this article, I succeeded in catching words in the following way. 在两个子字符串之间查找字符串参考这篇文章,我通过以下方式成功捕获了单词。

s = 'asdf=5;iwantthis123jasd'
result = re.search('asdf=5;(.*)123jasd', s)
print(result.group(1))

But in the sentence below it failed.但在下面的句子中它失败了。

s = '''        <div class="prod-origin-price ">
        <span class="discount-rate">
            4%
        </span>
            <span class="origin-price">'''


result = re.search('<span class="discount-rate">(.*)</span>', s)
print(result.group(1))

I'm trying to bring '4%'.我试图带来“4%”。 Everything else succeeds, but I don't know why only this one fails.其他一切都成功,但我不知道为什么只有这个失败。 Help帮助

Try this (mind the white spaces and new lines)试试这个(注意空格和换行)

import re
s = '''        <div class="prod-origin-price ">
        <span class="discount-rate">
            4%
        </span>
            <span class="origin-price">'''


result = re.search('<span class="discount-rate">\s*(.*)\s*</span>', s)
print(result.group(1))

Use re.DOTALL flag for matching new lines:使用 re.DOTALL 标志匹配新行:

result = re.search('<span class="discount-rate">(.*)</span>', s, re.DOTALL)

Documentation: https://docs.python.org/3/library/re.html文档: https://docs.python.org/3/library/re.html

This is structured data, not just a string, so we can use a library like Beautiful Soup to help us simplify such tasks:这是结构化数据,而不仅仅是字符串,因此我们可以使用Beautiful Soup之类的库来帮助我们简化此类任务:

from bs4 import BeautifulSoup

s = '''        <div class="prod-origin-price ">
        <span class="discount-rate">
            4%
        </span>
            <span class="origin-price">'''

soup = BeautifulSoup(s)
value = soup.find(class_='discount-rate').get_text(strip=True)
print(value)

# Output:
4%

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM