如何从特定 HTML 标签内的相同 html 标签中获取一行中的整个文本？

Question

I have a pretty long HTML file that looks like:我有一个很长的 HTML 文件，看起来像：

<div><nobr>
<span>ABC</span>
<span>DEF</span>
<span>GHI</span>
</nobr></div>

<div><nobr>
<span>100</span>
</nobr></div>

<div><nobr>
<span>JKL</span>
<span>MNO</span>
<span>PQR</span>
</nobr></div>

<div><nobr>
<span>101</span>
</div></nobr>'

This is what I have tried:这是我尝试过的：

soup = BeautifulSoup(html_code, 'lxml')
nobr_tags = soup.select('nobr')

How can I get the whole text inside span tags in a nobr HTML tag in one line using BeautifulSoup?如何使用 BeautifulSoup在一行中的nobr HTML 标签中获取span标签内的整个文本？

I want to get is:我想得到的是：

ABCDEFGHI, 100, JKLMNOPQR, 101, ...

But what I got was:但我得到的是：

ABC, DEF, GHI, 100, JKL, MNO, PQR, 101, ...

Some  tags have 2, 3, or 4  tags inside a  tag.一些标签在标签内有 2、3 或 4 个 标签。
No matter how many span tags there are in a nobr tag, I want to get all the text inside a  tag in one line.无论 nobr 标签中有多少个 span 标签，我都希望将标签内的所有文本放在一行中。

Answer 1

You can use a generator-expression to join() the tags with a , .您可以使用生成器表达式来join()带有,的标签。

from bs4 import BeautifulSoup

html_doc = """
<div>
   <nobr>
      <span>ABC</span>
      <span>DEF</span>
      <span>GHI</span>
   </nobr>
</div>
<div>
   <nobr>
      <span>100</span>
   </nobr>
</div>
<div>
   <nobr>
      <span>JKL</span>
      <span>MNO</span>
      <span>PQR</span>
   </nobr>
</div>
<div>
   <nobr>
      <span>101</span>
</div>
</nobr>
"""

soup = BeautifulSoup(html_doc, "lxml")

print(
    ", ".join(x.get_text(strip=True) for x in soup.select("nobr"))
)

Output:输出：

ABCDEFGHI, 100, JKLMNOPQR, 101

如何从特定 HTML 标签内的相同 html 标签中获取一行中的整个文本？

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-07-21 18:18:33

如何从特定 HTML 标签内的相同 html 标签中获取一行中的整个文本？

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-07-21 18:18:33

解决方案1
2 已采纳 2021-07-21 18:18:33