简体   繁体   中英

scraping using beautifulsoup

Here is a sample html page source code

<html><body><div class="a-section a-spacing-medium a-spacing-top-small" id="feature-bullets">
<ul class="a-vertical a-spacing-none">
<li><span class="a-list-item"> Material: Cotton ; Colour: Light blue</span>     </li>
<li><span class="a-list-item"> Closure Type: Zip</span></li>
<li><span class="a-list-item"> Fit Type: Slim Fit</span></li>
</ul>
</div></body></html>

How to get colour value (light blue) from this html page using Beautifulsoup ?

color = soup.find('ul', {'class' : 'a-vertical a-spacing-none'}).get('a-list-item')

Use BeautifulSoup parser.

>>> from bs4 import BeautifulSoup
>>> s = '''<html><body><div class="a-section a-spacing-medium a-spacing-top-small" id="feature-bullets">
<ul class="a-vertical a-spacing-none">
<li><span class="a-list-item"> Material: Cotton ; Colour: Light blue</span>     </li>
<li><span class="a-list-item"> Closure Type: Zip</span></li>
<li><span class="a-list-item"> Fit Type: Slim Fit</span></li>
</ul>
</div></body></html>'''
>>> soup = BeautifulSoup(s, 'lxml')
>>> txt = [i.text for i in soup.select('.a-vertical .a-list-item')]
>>> txt
[u' Material: Cotton ; Colour: Light blue', u' Closure Type: Zip', u' Fit Type: Slim Fit']
>>> import re
>>> next(re.search(r'Colour\s*:\s*([^;]+)', j).group(1) for j in txt if 'Colour' in j )
u'Light blue'
>>> 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM