[英]Web Scraping using python and bs4
I am trying to scrap data from following code of this URL bloomberg . 我正在尝试从此URL Bloomberg的以下代码中抓取数据 。
I want to fetch text of the following labels and their corresponding values: 我想获取以下标签及其对应值的文本:
* *
1. open.
2. previous close.
3. ytd return.
4. market_cap.
5. day range.
6. 52wk_range.
7. current_per_ratio.
8. shares_outstanding.
9. volume.
10. one_year_return.
11. Earnings_per_share.
12. price_sales.
13. divident_indecated_gross_yeild.
* *
I tried it but failed, don't know the correct way to do it with bs4 in python. 我尝试过,但是失败了,不知道在python中使用bs4的正确方法。
Please guide me to achieve it the way i want. 请指导我以我想要的方式实现它。
<div class="data-table data-table_detailed"><!-- no spaces --><div class="cell cell__mobile-basic cell__visible__even"> <div class="cell__label"> Open </div> <div
> class="cell__value cell__value_"> 1,040.40 </div> </div><!-- no spaces
> --><!-- no spaces --><div class="cell cell__mobile-basic"> <div class="cell__label"> Day Range </div> <div class="cell__value
> cell__value_"> 1,026.00 - 1,044.00 </div> </div><!-- no spaces --><!--
> no spaces --><div class="cell cell__mobile-basic cell__visible__even">
> <div class="cell__label"> Volume </div> <div class="cell__value
> cell__value_"> 2,580,677 </div> </div><!-- no spaces --><!-- no spaces
> --><div class="cell cell__mobile-basic"> <div class="cell__label"> Previous Close </div> <div class="cell__value cell__value_"> 1,040.45
> </div> </div><!-- no spaces --><!-- no spaces --><div class="cell
> cell__mobile-basic cell__visible__even"> <div class="cell__label">
> 52Wk Range </div> <div class="cell__value cell__value_"> 900.30 -
> 1,279.30 </div> </div><!-- no spaces --><!-- no spaces --><div
> class="cell cell__mobile-basic"> <div class="cell__label"> 1 Yr Return
> </div> <div class="cell__value cell__value_down"> -12.66% </div>
> </div><!-- no spaces --><!-- no spaces --><div class="cell
> cell__mobile-basic cell__visible__even"> <div class="cell__label"> YTD
> Return </div> <div class="cell__value cell__value_up"> 2.06% </div>
> </div><!-- no spaces --><!-- no spaces --><div class="cell "> <div
> class="cell__label"> Current P/E Ratio (TTM) </div> <div
> class="cell__value cell__value_"> 16.43 </div> </div><!-- no spaces
> --><!-- no spaces --><div class="cell cell__visible__even"> <div class="cell__label"> Earnings per Share (INR) (TTM) </div> <div
> class="cell__value cell__value_"> 62.76 </div> </div><!-- no spaces
> --><!-- no spaces --><div class="cell "> <div class="cell__label"> Market Cap (t INR) </div> <div class="cell__value cell__value_">
> 2.369 </div> </div><!-- no spaces --><!-- no spaces --> <div class="cell cell__visible__even"> <div class="cell__label"> Shares
> Outstanding (b) </div> <div class="cell__value cell__value_"> 2.297
> </div> </div><!-- no spaces --><!-- no spaces --><div class="cell ">
> <div class="cell__label"> Price/Sales (TTM) </div> <div
> class="cell__value cell__value_"> 3.47 </div> </div><!-- no spaces
> --><!-- no spaces --> <div class="cell cell__visible__even"> <div class="cell__label"> Dividend Indicated Gross Yield </div> <div
> class="cell__value cell__value_"> 2.45% </div> </div><!-- no spaces
> --></div>
for cell in soup.find_all("div", class_='data-table data-table_detailed'):
name = ""
namecell = cell.find("div", class_="cell__label", text=True)
if namecell is not None:
name = namecell.get_text(strip=True)
price_chage = cell.find("div", class_="cell__value cell__value").get_text(strip=True)
data+=( "%s: Price Change: %s," % (name, price_chage))
All the geeks who are passing by and giving negative votes, This is the code and i wrote it. 所有通过并给予否定票的怪胎,这是代码,我写下了。 Congratulations to me!, If you can't help any one then don't vote negatively. 祝贺我!,如果您不能帮助任何人,请不要投反对票。
for cell in soup.find_all("div", class_='cell ' ):
namecell = cell.find("div", class_="cell__label", text=True).get_text(strip=True)
if cell.find("div", class_=("cell__value cell__value_down"),text=True):
classText="cell__value cell__value_down"
elif cell.find("div", class_=("cell__value cell__value_up"),text=True):
classText = "cell__value cell__value_up"
else:
classText = "cell__value cell__value_"
value=cell.find("div", class_=(classText),text=True).get_text(strip=True)
if namecell and value is not None:
tbl_data.append( namecell+":"+value)
print tbl_data
output is: 输出为:
[u'Open:1,040.40', u'Day Range:1,026.00 - 1,044.00', u'Volume:2,580,677', u'Previous Close:1,040.45', u'52Wk Range:900.30 - 1,279.30', u'1 Yr Return:-12.66%', u'Current P/E Ratio (TTM):16.43', u'Earnings per Share (INR) (TTM):62.76', u'Market Cap (t INR):2.369', u'Shares Outstanding (b):2.297', u'Price/Sales (TTM):3.47', u'Dividend Indicated Gross Yield:2.45%']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.