[英]How to get the whole text in one line from the same html tags inside a specific HTML tag?
I have a pretty long HTML file that looks like:我有一个很长的 HTML 文件,看起来像:
<div><nobr>
<span>ABC</span>
<span>DEF</span>
<span>GHI</span>
</nobr></div>
<div><nobr>
<span>100</span>
</nobr></div>
<div><nobr>
<span>JKL</span>
<span>MNO</span>
<span>PQR</span>
</nobr></div>
<div><nobr>
<span>101</span>
</div></nobr>'
This is what I have tried:这是我尝试过的:
soup = BeautifulSoup(html_code, 'lxml')
nobr_tags = soup.select('nobr')
How can I get the whole text inside span
tags in a nobr
HTML tag in one line using BeautifulSoup?如何使用 BeautifulSoup在一行中的nobr
HTML 标签中获取span
标签内的整个文本?
I want to get is:我想得到的是:
ABCDEFGHI, 100, JKLMNOPQR, 101, ...
But what I got was:但我得到的是:
ABC, DEF, GHI, 100, JKL, MNO, PQR, 101, ...
Some <nobr>
tags have 2, 3, or 4 <span>
tags inside a <nobr>
tag.一些<nobr>
标签在<nobr>
标签内有 2、3 或 4 个<span>
<nobr>
标签。
No matter how many span tags there are in a nobr tag, I want to get all the text inside a <nobr>
tag in one line.无论 nobr 标签中有多少个 span 标签,我都希望将<nobr>
标签内的所有文本放在一行中。
You can use a generator-expression to join()
the tags with a ,
.您可以使用生成器表达式来join()
带有,
的标签。
from bs4 import BeautifulSoup
html_doc = """
<div>
<nobr>
<span>ABC</span>
<span>DEF</span>
<span>GHI</span>
</nobr>
</div>
<div>
<nobr>
<span>100</span>
</nobr>
</div>
<div>
<nobr>
<span>JKL</span>
<span>MNO</span>
<span>PQR</span>
</nobr>
</div>
<div>
<nobr>
<span>101</span>
</div>
</nobr>
"""
soup = BeautifulSoup(html_doc, "lxml")
print(
", ".join(x.get_text(strip=True) for x in soup.select("nobr"))
)
Output:输出:
ABCDEFGHI, 100, JKLMNOPQR, 101
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.