python re.findall 返回空列表

Question

我编写了下面的代码来从网站的源代码中获取一串文本。 如上所述，第一个 findall 工作正常，而第二个返回空列表。 我试图从 c 代码下的 html 中获取名称（Kendall Easley）。

for j in links:
    req = urllib2.Request(j, None, headers)
    response = urllib2.urlopen(req)
    page = response.read() #open source code
    org = re.findall(r'(?<=<meta content=").*?(?=" 
    property="og:title")', page)
    print(org) #works
    name = re.findall(r'(?<=ic_only=64" title=").*(?="><img alt=)', page)
    print(name) #prints empty list

<a data-popup="{&quot;type&quot;:&quot;profile&quot;}" href="/149855/profile/10525304/display_profile?pic_only=64" title="Kendall Easley"><img alt="Profile Photo" class="user-profile-pic profile_pic_64" height="64" src="https://orgsync.com/assets/icons/accounts/profile_pic_blank_64.gif" width="64" /></a>

Answer 1

我不确定我是否完全理解您的问题，但这会从该 html 字符串中提取名称。 希望能帮助到你

>>> import re
>>> 
>>> html_string = """<a data-popup="{&quot;type&quot;:&quot;profile&quot;}"href="/149855/profile/10525304/display_profile?pic_only=64" title="Kendall Easley"><img alt="Profile Photo" class="user-profile-pic profile_pic_64" height="64" src="https://orgsync.com/assets/icons/accounts/profile_pic_blank_64.gif" width="64" /></a>"""
>>> 
>>> name = re.findall(r".*title=\"(\w+\s+\w+)", html_string)
>>> 
>>> name
['Kendall Easley']

编辑*请注意，我在 html 字符串周围放置了三重引号

Answer 2

在第一次re.findall()之后，您已经阅读了文本并且标记位于文本的末尾。

你必须为你的文本做一个seek(0)或类似的事情（我正在阅读txt文件，这样对我re.findall ），然后再次re.findall 。 否则它会尝试从文本的末尾搜索，当然那里什么也没有。

（ps 我是 Python 新手（阅读了 4 周））

python re.findall 返回空列表

问题描述

2 个解决方案

解决方案1
0

解决方案2
-1 2018-10-11 10:41:17

python re.findall 返回空列表

问题描述

2 个解决方案

解决方案1 0

解决方案2 -1 2018-10-11 10:41:17

解决方案1
0

解决方案2
-1 2018-10-11 10:41:17