简体   繁体   English

如何使用python beautifulsoup从html中提取文本

[英]how to pull text from html with python beautifulsoup

I have the following text from a web page:我有一个网页上的以下文字:

<dd><p class="Definition"><span class="DefinitionTerm"><df>Example Term</df> 
</span>Here is the meaning of my term and its description; (<span 
class="TermLink" lang="fr">définition</span>)</p></dd>
<dd><p class="Definition"><span class="DefinitionTerm"><df>Example Term 
2</df></span>Here is the meaning of my term 2 and its description; (<span 
class="TermLink" lang="fr">définition</span>)</p></dd>
<dd><p class="Definition"><span class="DefinitionTerm"><df>Example Term 
3</df></span>Here is the meaning of my term 3 and its description; (<span 
class="TermLink" lang="fr">définition</span>)</p></dd>

I am trying to use the python beautifulsoup library to pull the Definitionterm eg "Example Term" followed by it's description.我正在尝试使用 python beautifulsoup 库来提取定义术语,例如“示例术语”,然后是它的描述。

Hence i would like to see: "Example Term", "Here is the meaning of my term and its description" "Example Term2", "Here is the meaning of my term2 and its description" "Example Term3", "Here is the meaning of my term3 and its description"因此,我想看到:“示例术语”、“这是我的术语及其描述的含义”“示例术语 2”、“这是我的术语 2 的含义及其描述”“示例术语 3”、“这是我的 term3 的含义及其描述”

html = '''<dd><p class="Definition"><span class="DefinitionTerm"><df>Example Term</df> </span>Here is the meaning of my term and its description; (<span class="TermLink" lang="fr">définition</span>)</p></dd><dd><p class="Definition"><span class="DefinitionTerm"><df>Example Term 2</df></span>Here is the meaning of my term 2 and its description; (<span class="TermLink" lang="fr">définition</span>)</p></dd><dd><p class="Definition"><span class="DefinitionTerm"><df>Example Term 3</df></span>Here is the meaning of my term 3 and its description; (<span class="TermLink" lang="fr">définition</span></p></dd>'''

soup = BeautifulSoup(html, 'html.parser')

for each in soup.findAll('p', class_='Definition'):
    print(each.get_text())`

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM