[英]findall() returns empty string on html file in Python REGEX
I am learning Regex with Python and I am doing the baby names exercise of the Google Tutorial on Regex.我正在用 Python 学习正则表达式,我正在做关于正则表达式的谷歌教程的婴儿名字练习。 The html file --baby1990.html-- is in a zipped file that can be downloaded here: https://developers.google.com/edu/python/set-up ('Download Google Python Exercises')
html 文件 --baby1990.html-- 是一个压缩文件,可以在此处下载: https : //developers.google.com/edu/python/set-up (“下载 Google Python 练习”)
The year is placed within Tags.年份放置在标签中。 The html code is the following:
html代码如下:
<h3 align="center">Popularity in 1990</h3>
I am using the following code to extract the year from the file:我正在使用以下代码从文件中提取年份:
f = open('C:/Users/ALEX/MyFiles/JUPYTER NOTEBOOKS/google-python-exercises/babynames/baby1990.html', 'r')
strings = re.findall(r'<h3 align="center">Popularity in (/d/d/d/d)</h3>', f.read())
I have tested the pattern with RegularExpressions101 website and it works.我已经使用 RegularExpressions101 网站测试了该模式并且它有效。
However the 'strings' list returned is empty.但是,返回的“字符串”列表是空的。
len(strings) out len(字符串)输出
I think the best way to match a year in a contextual string is to use re.search or re.match .我认为在上下文字符串中匹配年份的最佳方法是使用re.search或re.match 。
For instance:例如:
import re
tag = """<h3 align="center">Popularity in 1990</h3>"""
mo = re.search(r"Popularity in (\d{4})", tag)
year = mo.group(1) if mo else ""
print(year)
# -> 1990
Or course, if you want to find all matches, you need to use re.findall
…或者当然,如果要查找所有匹配项,则需要使用
re.findall
...
You check your Python RegEx, you can also try online with https://regex101.com/你检查你的Python RegEx,你也可以用https://regex101.com/在线试试
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.