简体   繁体   English

findall() 在 Python REGEX 中的 html 文件上返回空字符串

[英]findall() returns empty string on html file in Python REGEX

I am learning Regex with Python and I am doing the baby names exercise of the Google Tutorial on Regex.我正在用 Python 学习正则表达式,我正在做关于正则表达式的谷歌教程的婴儿名字练习。 The html file --baby1990.html-- is in a zipped file that can be downloaded here: https://developers.google.com/edu/python/set-up ('Download Google Python Exercises') html 文件 --baby1990.html-- 是一个压缩文件,可以在此处下载: https : //developers.google.com/edu/python/set-up (“下载 Google Python 练习”)

The year is placed within Tags.年份放置在标签中。 The html code is the following: html代码如下:

<h3 align="center">Popularity in 1990</h3>

I am using the following code to extract the year from the file:我正在使用以下代码从文件中提取年份:

f = open('C:/Users/ALEX/MyFiles/JUPYTER NOTEBOOKS/google-python-exercises/babynames/baby1990.html', 'r')

strings = re.findall(r'<h3 align="center">Popularity in (/d/d/d/d)</h3>', f.read())

I have tested the pattern with RegularExpressions101 website and it works.我已经使用 RegularExpressions101 网站测试了该模式并且它有效。

However the 'strings' list returned is empty.但是,返回的“字符串”列表是空的。

len(strings) out len(字符串)输出

I think the best way to match a year in a contextual string is to use re.search or re.match .我认为在上下文字符串中匹配年份的最佳方法是使用re.searchre.match

For instance:例如:

import re

tag = """<h3 align="center">Popularity in 1990</h3>"""

mo = re.search(r"Popularity in (\d{4})", tag)
year = mo.group(1) if mo else ""

print(year)
# -> 1990

Or course, if you want to find all matches, you need to use re.findall或者当然,如果要查找所有匹配项,则需要使用re.findall ...

You check your Python RegEx, you can also try online with https://regex101.com/你检查你的Python RegEx,你也可以用https://regex101.com/在线试试

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM