Python 美丽的汤：从元素中获取文本

Question

我正在循环遍历<td>类型的元素，但正在努力提取<td>文本。

HTML：

<td class="cell">
 Brand Name 1
 <br/>
 (
 <a class="tip" title="This title">
  Authorised Resellers
 </a>
 )
</td>

：所需的 output：

Brand name: Brand name 1
Brand distribution type: Authorised Reseller

我努力了：

for brand in brand_loop:
  print(brand.text)

但这不会打印开始<td>标记（“品牌名称 1”）之后的文本。

有什么建议么？ 谢谢！

Answer 1

尝试

for brand in brand_loop:
  print(brand.text)
  print(brand.find('a').text)

您只能直接打印所选元素的文本。

Answer 2

您可以 select <td class="cell">然后.find_next(text=True)获取品牌名称，然后.find_next('a')获取品牌分布类型。

例如：

txt = '''<td class="cell">
 Brand Name 1
 <br/>
 (
 <a class="tip" title="This title">
  Authorised Resellers
 </a>
 )
</td>'''


soup = BeautifulSoup(txt, 'html.parser')

brand_name = soup.select_one('td.cell').find_next(text=True)
bran_distribution = brand_name.find_next('a').text

print('Brand name:', brand_name.strip())
print('Brand distribution type:', bran_distribution.strip())

印刷：

Brand name: Brand Name 1
Brand distribution type: Authorised Resellers

Answer 3

您可以使用find()和next_element来获取第a td标记文本。而要简单地使用find()来获取标记文本。 你可以试试：

from bs4 import BeautifulSoup
html_doc = '''<td class="cell">
 Brand Name 1
 <br/>
 (
 <a class="tip" title="This title">
  Authorised Resellers
 </a>
 )
</td>'''

soup = BeautifulSoup(html_doc,'lxml')
brand_name = soup.find("td").next_element.strip()
brand_distribution_type = soup.find("a").text.strip()
print('Brand name:', brand_name)
print('Brand distribution type:', brand_distribution_type)

Output 将是：

Brand name: Brand Name 1
Brand distribution type: Authorised Resellers

Python 美丽的汤：从元素中获取文本

问题描述

3 个解决方案

解决方案1
0 2020-06-15 08:37:31

解决方案2
0 2020-06-15 09:42:38

解决方案3
0 2020-06-15 12:52:25

Python 美丽的汤：从元素中获取文本

问题描述

3 个解决方案

解决方案1 0 2020-06-15 08:37:31

解决方案2 0 2020-06-15 09:42:38

解决方案3 0 2020-06-15 12:52:25

解决方案1
0 2020-06-15 08:37:31

解决方案2
0 2020-06-15 09:42:38

解决方案3
0 2020-06-15 12:52:25