findall() 在 Python REGEX 中的 html 文件上返回空字符串

Question

I am learning Regex with Python and I am doing the baby names exercise of the Google Tutorial on Regex.我正在用 Python 学习正则表达式，我正在做关于正则表达式的谷歌教程的婴儿名字练习。 The html file --baby1990.html-- is in a zipped file that can be downloaded here: https://developers.google.com/edu/python/set-up ('Download Google Python Exercises') html 文件 --baby1990.html-- 是一个压缩文件，可以在此处下载： https : //developers.google.com/edu/python/set-up （“下载 Google Python 练习”）

The year is placed within Tags.年份放置在标签中。 The html code is the following: html代码如下：

<h3 align="center">Popularity in 1990</h3>

I am using the following code to extract the year from the file:我正在使用以下代码从文件中提取年份：

f = open('C:/Users/ALEX/MyFiles/JUPYTER NOTEBOOKS/google-python-exercises/babynames/baby1990.html', 'r')

strings = re.findall(r'<h3 align="center">Popularity in (/d/d/d/d)</h3>', f.read())

I have tested the pattern with RegularExpressions101 website and it works.我已经使用 RegularExpressions101 网站测试了该模式并且它有效。

However the 'strings' list returned is empty.但是，返回的“字符串”列表是空的。

len(strings) out len（字符串）输出

Answer 1

I think the best way to match a year in a contextual string is to use re.search or re.match .我认为在上下文字符串中匹配年份的最佳方法是使用re.search或re.match 。

For instance:例如：

import re

tag = """<h3 align="center">Popularity in 1990</h3>"""

mo = re.search(r"Popularity in (\d{4})", tag)
year = mo.group(1) if mo else ""

print(year)
# -> 1990

Or course, if you want to find all matches, you need to use re.findall …或者当然，如果要查找所有匹配项，则需要使用re.findall ...

You check your Python RegEx, you can also try online with https://regex101.com/你检查你的Python RegEx，你也可以用https://regex101.com/在线试试

findall() 在 Python REGEX 中的 html 文件上返回空字符串

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-01-10 20:01:39

findall() 在 Python REGEX 中的 html 文件上返回空字符串

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-01-10 20:01:39

解决方案1
0 已采纳 2017-01-10 20:01:39