[英]Python `beautifulsoup` extraction on urls lacking `class`, other attributes?
Quick question [I am not very familiar with Python's BeautifulSoup()
] If I have the following element,快速提问 [我对 Python 的
BeautifulSoup()
不是很熟悉] 如果我有以下元素,
how can I extract/get "1 comment" (or, "2 comments", etc.)?如何提取/获取“1 条评论”(或“2 条评论”等)? There is no
class
(or id
, or other attributes) in that " a
" tag.该“
a
”标签中没有class
(或id
或其他属性)。
<td class="subtext">
<a href="item?id=22823679">1 comment</a>
</td>
You can use select
method to apply a querySelect into your html, and then take the contents
of the elements you found:您可以使用
select
方法将 querySelect 应用到您的 html 中,然后获取您找到的元素的contents
:
elements = soup.select(".subtext a")
[x.contents for x in elements]
How about the following, test with local html
file下面怎么样,用本地
html
文件测试
from bs4 import BeautifulSoup
url = "D:\\Temp\\example.html"
with open(url, "r") as page:
contents = page.read()
soup = BeautifulSoup(contents, 'html.parser')
element = soup.select('td.subtext')
value = element[0].get_text()
print(value)
example.html例子.html
<html>
<head></head>
<body>
<td class="subtext">
<a href="item?id=22823679">1 comment</a>
</td>
</body>
</html>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.