I want to extract the translation of a word in online dictionary. For example, the html code for 'car':
<ol class="sense_list level_1">
<li class="sense_list_item level_1" value="1"><span class="def">any vehicle on wheels</span></li>
How can I extract "any vehicle on wheels" in Python with beautifulsoup or any other modules?
There are multiple ways to reach the desired element.
Probably the simplest would be to find it by class
:
soup.find('span', class_='def').text
or, with a CSS selector
:
soup.select('span.def')[0].text
or, additionally checking the parents:
soup.select('ol.level_1 > li.level_1 > span.def')[0].text
or:
soup.select('ol.level_1 > li[value=1] > span.def')[0].text
I solve it by beautifulsoup:
soup = bs4.BeautifulSoup(html)
q1=soup.find('li', class_="sense_list_item level_1",value='1').text
Assuming that is the only HTML code given, you can use NLTK .
import nltk
#load html chunk into variable htmlstring#
extract = nltk.clean_html(htmlstring)
print(extract)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.