[英]How to get the name of the tags using BeautifulSoup in Python
I have a sample html code as follows:我有一个示例 html 代码如下:
<tr class="searchResultsItem" data-id="455">
<td class="searchResultsImage">
<a href="/pets/2-months-english-bulldog-puppy" title="English BullDog">
....
..
..
<tr class="searchResultsItem" data-id="456">
<td class="searchResultsImage">
<a href="/pets/3-months-french-bulldog-puppy" title="French BullDog">
....
..
..
The HTML code goes on and I have succeeded to get the content from between the tags; HTML 代码继续,我已成功从标签之间获取内容; but I need the tag titles/names.但我需要标签标题/名称。 To be specific I need to make a list that goes on like 455, 456 etc. I searched stackoverflow to find examples below:具体来说,我需要列出一个类似 455、456 等的列表。我搜索了 stackoverflow 以找到以下示例:
soup.body.find('tr')['data-id']
They don't work.他们不工作。
You can select all <tr>
with attribute data-id=
and then search <a>
inside the rows.您可以使用属性data-id=
select all <tr>
然后在行内搜索<a>
。
For example:例如:
txt = '''
<tr class="searchResultsItem" data-id="455">
<td class="searchResultsImage">
<a href="/pets/2-months-english-bulldog-puppy" title="English BullDog">
</td>
</tr>
<tr class="searchResultsItem" data-id="456">
<td class="searchResultsImage">
<a href="/pets/3-months-french-bulldog-puppy" title="French BullDog">
</td>
</tr>'''
soup = BeautifulSoup(txt, 'html.parser')
for tr in soup.select('tr[data-id]'):
print(tr['data-id'], tr.find('a')['title'])
Prints:印刷:
455 English BullDog
456 French BullDog
You didn't describe what means "don't work"
.您没有描述"don't work"
的含义。
Maybe you have other <tr>
without data-id
and using only find('tr')
you get this element and when you try to get ['data-id']
then it can't find it.也许你有其他没有data-id
<tr>
并且只使用find('tr')
你得到这个元素,当你尝试获取['data-id']
时它找不到它。
But you can use find('tr', {'data-id': True})
or find_all('tr', {'data-id': True})
to get only elements which have data-id
但是您可以使用find('tr', {'data-id': True})
或find_all('tr', {'data-id': True})
仅获取具有data-id
元素
In example I added <tr>
without data-id
as first element to shows that it can gives error.在示例中,我添加了没有data-id
的<tr>
作为第一个元素,以表明它可以给出错误。
text = '''
<tr></tr>
<tr data-id="455"></tr>
<tr data-id="456"></tr>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(text, 'html.parser')
#for tr in soup.find_all('tr', {'data-id': True}):
# print(tr['data-id'])
print("--- with {'data-id': True} ---")
print('html :', soup.find('tr', {'data-id': True}))
print('data-id:', soup.find('tr', {'data-id': True})['data-id'])
print("--- without {'data-id': True} ---")
print('html :', soup.find('tr'))
print('data-id:', soup.find('tr')['data-id'])
Result:结果:
--- with {'data-id': True} ---
html : <tr data-id="455"></tr>
data-id: 455
--- without {'data-id': True} ---
html : <tr></tr>
Traceback (most recent call last):
...
KeyError: 'data-id'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.