I would like to find 30,850
in:
<div class='user-information__achievements-heading' data-test-points-title>
Points
</div>
<div class='user-information__achievements-data' data-test-points-count>
30,850
</div>
</div>
with:
^(?!<div class='user-information__achievements-data' data-test-points-count>
|<.div>)(.*)$
(returns nothing)
How come ^(?!START\\-OF\\-FIELDS|END\\-OF\\-FIELDS)(.*)$
does work for:
START-OF-FIELDS
<div>
Line A
END-OF-FIELDS
(returns <div>
)?
Besides I totally agree to never parse HTML with re (and it's really fun to read, btw) if you only have this piece of text and need a quick re.search
, a simple r'\\d+,\\d+'
would do...:
import re
s = '''<div class='user-information__achievements-heading' data-test-points-title>
Points
</div>
<div class='user-information__achievements-data' data-test-points-count>
30,850
</div>
</div>'''
re.search(r'\d+,\d+', s)
<re.Match object; span=(179, 185), match='30,850'>
No need for regex just do:
i=" <div class='user-information__achievements-data' data-test-points-count>"
print(s.splitlines()[s.splitlines().index(i)+1].lstrip())
Output:
30,850
You also can search text by bs4
from bs4 import BeautifulSoup
tx = """
<div class='user-information__achievements-heading' data-test-points-title>
Points
</div>
<div class='user-information__achievements-data' data-test-points-count>
30,850
</div>
</div>
"""
bs = BeautifulSoup(tx,"lxml")
result = bs.find("div",{"class":"user-information__achievements-data"}).text
print(result.strip()) # 30,850
You want re.DOTALL
because by default .
doesn't match newlines and line brakes.
re.compile(YOUR_REGEX, flags=re.S)
You can also prepend your regex with (?s)
for the same effect.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.