简体   繁体   中英

How to get the whole text in one line from the same html tags inside a specific HTML tag?

I have a pretty long HTML file that looks like:

<div><nobr>
<span>ABC</span>
<span>DEF</span>
<span>GHI</span>
</nobr></div>

<div><nobr>
<span>100</span>
</nobr></div>

<div><nobr>
<span>JKL</span>
<span>MNO</span>
<span>PQR</span>
</nobr></div>

<div><nobr>
<span>101</span>
</div></nobr>'

This is what I have tried:

soup = BeautifulSoup(html_code, 'lxml')
nobr_tags = soup.select('nobr')

How can I get the whole text inside span tags in a nobr HTML tag in one line using BeautifulSoup?

I want to get is:

ABCDEFGHI, 100, JKLMNOPQR, 101, ... 

But what I got was:

ABC, DEF, GHI, 100, JKL, MNO, PQR, 101, ...

Some <nobr> tags have 2, 3, or 4 <span> tags inside a <nobr> tag.
No matter how many span tags there are in a nobr tag, I want to get all the text inside a <nobr> tag in one line.

You can use a generator-expression to join() the tags with a , .

from bs4 import BeautifulSoup

html_doc = """
<div>
   <nobr>
      <span>ABC</span>
      <span>DEF</span>
      <span>GHI</span>
   </nobr>
</div>
<div>
   <nobr>
      <span>100</span>
   </nobr>
</div>
<div>
   <nobr>
      <span>JKL</span>
      <span>MNO</span>
      <span>PQR</span>
   </nobr>
</div>
<div>
   <nobr>
      <span>101</span>
</div>
</nobr>
"""

soup = BeautifulSoup(html_doc, "lxml")

print(
    ", ".join(x.get_text(strip=True) for x in soup.select("nobr"))
)

Output:

ABCDEFGHI, 100, JKLMNOPQR, 101

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM