繁体   English   中英

如何通过漂亮的汤 python 在标签内获取 html 文本

[英]how to get a html text inside tag through beautiful soup python

如何通过beautifulsoup提取以下数据?

<Tag1>
                    <message code="able to extract text from here"/>
                    <text value="able to extract text that is here"/>
                    <htmlText>&lt;![CDATA[&lt;p&gt;some thing &lt;lite&gt;OR&lt;/lite&gt;get exact data from here&lt;/p&gt;]]&gt;</htmlText>
</Tag1>

我尝试了.findall.get_text ,但是我无法提取“htmlText”值中的文本

我期待的输出是“某事或从这里获取确切数据”

您可以使用 BeautifulSoup 两次,首先提取htmlText元素,然后解析内容。 例如:

from bs4 import BeautifulSoup

html = """
<Tag1>
    <message code="able to extract text from here"/>
    <text value="able to extract text that is here"/>
    <htmlText>&lt;![CDATA[&lt;p&gt;some thing &lt;lite&gt;OR&lt;/lite&gt;get exact data from here&lt;/p&gt;]]&gt;</htmlText>
</Tag1>
"""
soup = BeautifulSoup(html, "lxml")

for tag1 in soup.find_all("tag1"):
    cdata_html = tag1.htmltext.text
    cdata_soup = BeautifulSoup(cdata_html, "lxml")
    
    print(cdata_soup.p.text)

哪个会显示:

some thing ORget exact data from here

在线IDE中的代码和示例使用最易读的):

from bs4 import BeautifulSoup
import lxml

html = """
<Tag1>
    <message code="able to extract text from here"/>
    <text value="able to extract text that is here"/>
    <htmlText>&lt;![CDATA[&lt;p&gt;some thing &lt;lite&gt;OR&lt;/lite&gt;get exact data from here&lt;/p&gt;]]&gt;</htmlText>
</Tag1>
"""

soup = BeautifulSoup(html, "lxml")


#  BeautifulSoup inside BeautifulSoup
undreadable_soup = BeautifulSoup(BeautifulSoup(html, "lxml").select_one('htmlText').text, "lxml").p.text
print(undreadable_soup)


example_1 = BeautifulSoup(soup.select_one('htmlText').text, "lxml").p.text
print(text_1)


# wihtout hardcoded list slices
for result in soup.select("htmlText"):
    example_2 = BeautifulSoup(result.text, "lxml").p.text
    print(example_2)


# or one liner
example_3 = ''.join([BeautifulSoup(result.text, "lxml").p.text for result in soup.select("htmlText")])
print(example_3)


# output
'''
some thing ORget exact data from here
some thing ORget exact data from here
some thing ORget exact data from here
some thing ORget exact data from here
'''

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM