如何获取某些元素的文本 BeautifulSoup Python

Question

I have this kind of html code我有这种 html 代码

<tr>
  <td class="a">...</td>
  <td class="a">...</td>
  <td class="a">
    <p>
      <sup>
        Name Name Name
      </sup>
    </p>
  </td>
  <td class="a">...</td>
  <td class="a">...</td>
  <td class="a">
    <p>
      <sup>25.01.1980</sup>
    </p>
  </td>
  <td class="a">...</td>
  <td class="a">...</td>
</tr>
<tr>...</tr>
<tr>...</tr>

I need to get the text of every 3rd and 5th td of every tr我需要获取每个tr的第 3 个和第 5 个td的文本

Apparently this doesn't work:)显然这不起作用:)

from bs4 import BeautifulSoup
import index

soup = BeautifulSoup(index.index_doc, 'lxml')

for i in soup.find_all('tr')[2:]:
    print(i[2].text, i[4].text)

Answer 1

You could use css selectors and pseudo classe :nth-of-type() to select your elements (assumed you need the date, so I selected the 6th td):您可以使用css selectors和伪:nth-of-type()到 select 您的元素（假设您需要日期，所以我选择了第 6 个 td）：

data = [e.get_text(strip=True) for e in soup.select('tr td:nth-of-type(3),tr td:nth-of-type(6)')]

And to get a list of tuples :并获取tuples列表：

list(zip(data, data[1:]))

Example例子

from bs4 import BeautifulSoup

html = '''
<tr>
  <td class="a">...</td>
  <td class="a">...</td>
  <td class="a">
    <p>
      <sup>
        Name Name Name
      </sup>
    </p>
  </td>
  <td class="a">...</td>
  <td class="a">...</td>
  <td class="a">
    <p>
      <sup>25.01.1980</sup>
    </p>
  </td>
  <td class="a">...</td>
  <td class="a">...</td>
</tr>
<tr>...</tr>
<tr>...</tr>
'''
soup = BeautifulSoup(html)

data = [e.get_text(strip=True) for e in soup.select('tr td:nth-of-type(3),tr td:nth-of-type(6)')]

list(zip(data, data[1:]))

如何获取某些元素的文本 BeautifulSoup Python

问题描述

1 个解决方案

解决方案1
0 已采纳 2023-01-31 11:44:17

Example例子

如何获取某些元素的文本 BeautifulSoup Python

问题描述

1 个解决方案

解决方案1 0 已采纳 2023-01-31 11:44:17

Example例子

解决方案1
0 已采纳 2023-01-31 11:44:17