如何在 Python 中使用 BeautifulSoup 获取标签的名称

Question

I have a sample html code as follows:我有一个示例 html 代码如下：

<tr class="searchResultsItem" data-id="455">
<td class="searchResultsImage">
<a href="/pets/2-months-english-bulldog-puppy" title="English BullDog">
....
..
..
<tr class="searchResultsItem" data-id="456">
<td class="searchResultsImage">
<a href="/pets/3-months-french-bulldog-puppy" title="French BullDog">
....
..
..

The HTML code goes on and I have succeeded to get the content from between the tags; HTML 代码继续，我已成功从标签之间获取内容； but I need the tag titles/names.但我需要标签标题/名称。 To be specific I need to make a list that goes on like 455, 456 etc. I searched stackoverflow to find examples below:具体来说，我需要列出一个类似 455、456 等的列表。我搜索了 stackoverflow 以找到以下示例：

soup.body.find('tr')['data-id']

They don't work.他们不工作。

Answer 1

You can select all <tr> with attribute data-id= and then search <a> inside the rows.您可以使用属性data-id= select all <tr>然后在行内搜索<a> 。

For example:例如：

txt = '''
<tr class="searchResultsItem" data-id="455">
<td class="searchResultsImage">
<a href="/pets/2-months-english-bulldog-puppy" title="English BullDog">
</td>
</tr>

<tr class="searchResultsItem" data-id="456">
<td class="searchResultsImage">
<a href="/pets/3-months-french-bulldog-puppy" title="French BullDog">
</td>
</tr>'''

soup = BeautifulSoup(txt, 'html.parser')

for tr in soup.select('tr[data-id]'):
    print(tr['data-id'], tr.find('a')['title'])

Prints:印刷：

455 English BullDog
456 French BullDog

Answer 2

You didn't describe what means "don't work" .您没有描述"don't work"的含义。

Maybe you have other <tr> without data-id and using only find('tr') you get this element and when you try to get ['data-id'] then it can't find it.也许你有其他没有data-id <tr>并且只使用find('tr')你得到这个元素，当你尝试获取['data-id']时它找不到它。

But you can use find('tr', {'data-id': True}) or find_all('tr', {'data-id': True}) to get only elements which have data-id但是您可以使用find('tr', {'data-id': True})或find_all('tr', {'data-id': True})仅获取具有data-id元素

In example I added <tr> without data-id as first element to shows that it can gives error.在示例中，我添加了没有data-id的<tr>作为第一个元素，以表明它可以给出错误。

text = '''
<tr></tr>
<tr data-id="455"></tr>
<tr data-id="456"></tr>
'''

from bs4 import BeautifulSoup
soup = BeautifulSoup(text, 'html.parser')

#for tr in soup.find_all('tr', {'data-id': True}):
#    print(tr['data-id'])

print("--- with {'data-id': True} ---")
print('html   :', soup.find('tr', {'data-id': True}))
print('data-id:', soup.find('tr', {'data-id': True})['data-id'])

print("--- without {'data-id': True} ---")
print('html   :', soup.find('tr'))
print('data-id:', soup.find('tr')['data-id'])

Result:结果：

--- with {'data-id': True} ---
html   : <tr data-id="455"></tr>
data-id: 455
--- without {'data-id': True} ---
html   : <tr></tr>
Traceback (most recent call last):
...
KeyError: 'data-id'

如何在 Python 中使用 BeautifulSoup 获取标签的名称

问题描述

2 个解决方案

解决方案1
0 2020-06-18 21:00:43

解决方案2
0 2020-06-19 00:05:09

如何在 Python 中使用 BeautifulSoup 获取标签的名称

问题描述

2 个解决方案

解决方案1 0 2020-06-18 21:00:43

解决方案2 0 2020-06-19 00:05:09

解决方案1
0 2020-06-18 21:00:43

解决方案2
0 2020-06-19 00:05:09