繁体   English   中英

python网页抓取,提取标签的内部元素

[英]python web scraping, extracting inner element of tag

我想从在线购物网站上抓取产品和价格,需要帮助提取标签之间的字符串

import bs4
from urllib.request import urlopen
from bs4 import BeautifulSoup as soup
my_url='https://www.flipkart.com/cameras/mirrorless~type/pr?sid=jek%2Cp31'
cl=urlopen(my_url)
page_html=cl.read()
ps=soup(page_html,'html5lib')
ps1=(ps.prettify())
cn=ps.findAll('div',{'class':'_1-2Iqu row'})
len(cn)                     
cn[0].div.div

#output-"<div class="_3wU53n">Canon M50 Mirrorless Camera Body with Single Lens EF-M 15-45 mm ISSTM</div>
#i need Canon M50 Mirrorless Camera Body with Single Lens EF-M 15-45 mm ISSTM

将 cn=ps.findAll('div',{'class':'_1-2Iqu row'}) 替换为 cn=ps.findAll('div',{'class':'_1-2Iqu row'},text=真的)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM