[英]How to get a text of certain elements BeautifulSoup Python
I have this kind of html code我有这种 html 代码
<tr>
<td class="a">...</td>
<td class="a">...</td>
<td class="a">
<p>
<sup>
Name Name Name
</sup>
</p>
</td>
<td class="a">...</td>
<td class="a">...</td>
<td class="a">
<p>
<sup>25.01.1980</sup>
</p>
</td>
<td class="a">...</td>
<td class="a">...</td>
</tr>
<tr>...</tr>
<tr>...</tr>
I need to get the text of every 3rd and 5th td of every tr我需要获取每个tr的第 3 个和第 5 个td的文本
Apparently this doesn't work:)显然这不起作用:)
from bs4 import BeautifulSoup
import index
soup = BeautifulSoup(index.index_doc, 'lxml')
for i in soup.find_all('tr')[2:]:
print(i[2].text, i[4].text)
You could use css selectors
and pseudo classe :nth-of-type()
to select your elements (assumed you need the date, so I selected the 6th td):您可以使用
css selectors
和伪:nth-of-type()
到 select 您的元素(假设您需要日期,所以我选择了第 6 个 td):
data = [e.get_text(strip=True) for e in soup.select('tr td:nth-of-type(3),tr td:nth-of-type(6)')]
And to get a list of tuples
:并获取
tuples
列表:
list(zip(data, data[1:]))
from bs4 import BeautifulSoup
html = '''
<tr>
<td class="a">...</td>
<td class="a">...</td>
<td class="a">
<p>
<sup>
Name Name Name
</sup>
</p>
</td>
<td class="a">...</td>
<td class="a">...</td>
<td class="a">
<p>
<sup>25.01.1980</sup>
</p>
</td>
<td class="a">...</td>
<td class="a">...</td>
</tr>
<tr>...</tr>
<tr>...</tr>
'''
soup = BeautifulSoup(html)
data = [e.get_text(strip=True) for e in soup.select('tr td:nth-of-type(3),tr td:nth-of-type(6)')]
list(zip(data, data[1:]))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.