简体   繁体   English

如何获取某些元素的文本 BeautifulSoup Python

[英]How to get a text of certain elements BeautifulSoup Python

I have this kind of html code我有这种 html 代码

<tr>
  <td class="a">...</td>
  <td class="a">...</td>
  <td class="a">
    <p>
      <sup>
        Name Name Name
      </sup>
    </p>
  </td>
  <td class="a">...</td>
  <td class="a">...</td>
  <td class="a">
    <p>
      <sup>25.01.1980</sup>
    </p>
  </td>
  <td class="a">...</td>
  <td class="a">...</td>
</tr>
<tr>...</tr>
<tr>...</tr>

I need to get the text of every 3rd and 5th td of every tr我需要获取每个tr的第 3 个和第 5 个td的文本

Apparently this doesn't work:)显然这不起作用:)

from bs4 import BeautifulSoup
import index

soup = BeautifulSoup(index.index_doc, 'lxml')

for i in soup.find_all('tr')[2:]:
    print(i[2].text, i[4].text)

You could use css selectors and pseudo classe :nth-of-type() to select your elements (assumed you need the date, so I selected the 6th td):您可以使用css selectors和伪:nth-of-type()到 select 您的元素(假设您需要日期,所以我选择了第 6 个 td):

data = [e.get_text(strip=True) for e in soup.select('tr td:nth-of-type(3),tr td:nth-of-type(6)')]

And to get a list of tuples :并获取tuples列表:

list(zip(data, data[1:]))

Example例子

from bs4 import BeautifulSoup

html = '''
<tr>
  <td class="a">...</td>
  <td class="a">...</td>
  <td class="a">
    <p>
      <sup>
        Name Name Name
      </sup>
    </p>
  </td>
  <td class="a">...</td>
  <td class="a">...</td>
  <td class="a">
    <p>
      <sup>25.01.1980</sup>
    </p>
  </td>
  <td class="a">...</td>
  <td class="a">...</td>
</tr>
<tr>...</tr>
<tr>...</tr>
'''
soup = BeautifulSoup(html)

data = [e.get_text(strip=True) for e in soup.select('tr td:nth-of-type(3),tr td:nth-of-type(6)')]

list(zip(data, data[1:]))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM