简体   繁体   English

Python 在缺少“类”的 url 上提取“beautifulsoup”,其他属性?

[英]Python `beautifulsoup` extraction on urls lacking `class`, other attributes?

Quick question [I am not very familiar with Python's BeautifulSoup() ] If I have the following element,快速提问 [我对 Python 的BeautifulSoup()不是很熟悉] 如果我有以下元素,

how can I extract/get "1 comment" (or, "2 comments", etc.)?如何提取/获取“1 条评论”(或“2 条评论”等)? There is no class (or id , or other attributes) in that " a " tag.该“ a ”标签中没有class (或id或其他属性)。

<td class="subtext">
  <a href="item?id=22823679">1&nbsp;comment</a>
</td>

You can use select method to apply a querySelect into your html, and then take the contents of the elements you found:您可以使用select方法将 querySelect 应用到您的 html 中,然后获取您找到的元素的contents

elements = soup.select(".subtext a")
[x.contents for x in elements]

How about the following, test with local html file下面怎么样,用本地html文件测试

from bs4 import BeautifulSoup

url = "D:\\Temp\\example.html"

with open(url, "r") as page:
    contents = page.read()
    soup = BeautifulSoup(contents, 'html.parser')
    element = soup.select('td.subtext')
    value = element[0].get_text()
    print(value)

example.html例子.html

<html>
    <head></head>
        <body>
            <td class="subtext">
                <a href="item?id=22823679">1&nbsp;comment</a>
            </td>
        </body>
</html>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM