简体   繁体   English

如何在 Python 中使用 BeautifulSoup 获取标签的名称

[英]How to get the name of the tags using BeautifulSoup in Python

I have a sample html code as follows:我有一个示例 html 代码如下:

<tr class="searchResultsItem" data-id="455">
<td class="searchResultsImage">
<a href="/pets/2-months-english-bulldog-puppy" title="English BullDog">
....
..
..
<tr class="searchResultsItem" data-id="456">
<td class="searchResultsImage">
<a href="/pets/3-months-french-bulldog-puppy" title="French BullDog">
....
..
..

The HTML code goes on and I have succeeded to get the content from between the tags; HTML 代码继续,我已成功从标签之间获取内容; but I need the tag titles/names.但我需要标签标题/名称。 To be specific I need to make a list that goes on like 455, 456 etc. I searched stackoverflow to find examples below:具体来说,我需要列出一个类似 455、456 等的列表。我搜索了 stackoverflow 以找到以下示例:

soup.body.find('tr')['data-id']

They don't work.他们不工作。

You can select all <tr> with attribute data-id= and then search <a> inside the rows.您可以使用属性data-id= select all <tr>然后在行内搜索<a>

For example:例如:

txt = '''
<tr class="searchResultsItem" data-id="455">
<td class="searchResultsImage">
<a href="/pets/2-months-english-bulldog-puppy" title="English BullDog">
</td>
</tr>

<tr class="searchResultsItem" data-id="456">
<td class="searchResultsImage">
<a href="/pets/3-months-french-bulldog-puppy" title="French BullDog">
</td>
</tr>'''

soup = BeautifulSoup(txt, 'html.parser')

for tr in soup.select('tr[data-id]'):
    print(tr['data-id'], tr.find('a')['title'])

Prints:印刷:

455 English BullDog
456 French BullDog

You didn't describe what means "don't work" .您没有描述"don't work"的含义。

Maybe you have other <tr> without data-id and using only find('tr') you get this element and when you try to get ['data-id'] then it can't find it.也许你有其他没有data-id <tr>并且只使用find('tr')你得到这个元素,当你尝试获取['data-id']时它找不到它。

But you can use find('tr', {'data-id': True}) or find_all('tr', {'data-id': True}) to get only elements which have data-id但是您可以使用find('tr', {'data-id': True})find_all('tr', {'data-id': True})仅获取具有data-id元素

In example I added <tr> without data-id as first element to shows that it can gives error.在示例中,我添加了没有data-id<tr>作为第一个元素,以表明它可以给出错误。

text = '''
<tr></tr>
<tr data-id="455"></tr>
<tr data-id="456"></tr>
'''

from bs4 import BeautifulSoup
soup = BeautifulSoup(text, 'html.parser')

#for tr in soup.find_all('tr', {'data-id': True}):
#    print(tr['data-id'])

print("--- with {'data-id': True} ---")
print('html   :', soup.find('tr', {'data-id': True}))
print('data-id:', soup.find('tr', {'data-id': True})['data-id'])

print("--- without {'data-id': True} ---")
print('html   :', soup.find('tr'))
print('data-id:', soup.find('tr')['data-id'])

Result:结果:

--- with {'data-id': True} ---
html   : <tr data-id="455"></tr>
data-id: 455
--- without {'data-id': True} ---
html   : <tr></tr>
Traceback (most recent call last):
...
KeyError: 'data-id'  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用Python中的BeautifulSoup获取XML文件中所有具有相同名称的标签? - How to get all the tags with same name inside an XML file using BeautifulSoup in Python? python BeautifulSoup如何获取标签之间的值? - python BeautifulSoup how get values between tags? 如何在python中使用beautifulsoup获取html标签的值? - How to get value of html tags with beautifulsoup in python? 如何使用 BeautifulSoup 去除特殊标签? - How to get rid of special tags by using BeautifulSoup? 如何使用 python BeautifulSoup 模块获取 twitter 配置文件名称? - How to get twitter profile name using python BeautifulSoup module? Python BeautifulSoup - 抓取 Div Span 和 p 标签 - 以及如何获得 div 名称的精确匹配 - Python BeautifulSoup - Scraping Div Spans and p tags - also how to get exact match on div name 无法使用Python Beautifulsoup获取所有标记/文本抓取网站 - Not able to get all tags/text scraping a website using Python Beautifulsoup Python:无法<span>使用 BeautifulSoup</span>获取所有<span>标签</span>中的所有文本 - Python: failed to get all the text in all the <span> tags using BeautifulSoup 通过 python 中的 BeautifulSoup 获取使用特定样式的所有标签 - get all tags using specific style by BeautifulSoup in python 使用 BeautifulSoup 获取标签和文本 - Using BeautifulSoup to get tags and text
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM