简体   繁体   中英

how to pull text from html with python beautifulsoup

I have the following text from a web page:

<dd><p class="Definition"><span class="DefinitionTerm"><df>Example Term</df> 
</span>Here is the meaning of my term and its description; (<span 
class="TermLink" lang="fr">définition</span>)</p></dd>
<dd><p class="Definition"><span class="DefinitionTerm"><df>Example Term 
2</df></span>Here is the meaning of my term 2 and its description; (<span 
class="TermLink" lang="fr">définition</span>)</p></dd>
<dd><p class="Definition"><span class="DefinitionTerm"><df>Example Term 
3</df></span>Here is the meaning of my term 3 and its description; (<span 
class="TermLink" lang="fr">définition</span>)</p></dd>

I am trying to use the python beautifulsoup library to pull the Definitionterm eg "Example Term" followed by it's description.

Hence i would like to see: "Example Term", "Here is the meaning of my term and its description" "Example Term2", "Here is the meaning of my term2 and its description" "Example Term3", "Here is the meaning of my term3 and its description"

html = '''<dd><p class="Definition"><span class="DefinitionTerm"><df>Example Term</df> </span>Here is the meaning of my term and its description; (<span class="TermLink" lang="fr">définition</span>)</p></dd><dd><p class="Definition"><span class="DefinitionTerm"><df>Example Term 2</df></span>Here is the meaning of my term 2 and its description; (<span class="TermLink" lang="fr">définition</span>)</p></dd><dd><p class="Definition"><span class="DefinitionTerm"><df>Example Term 3</df></span>Here is the meaning of my term 3 and its description; (<span class="TermLink" lang="fr">définition</span></p></dd>'''

soup = BeautifulSoup(html, 'html.parser')

for each in soup.findAll('p', class_='Definition'):
    print(each.get_text())`

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM