简体   繁体   中英

Web Scraping using python and bs4

I am trying to scrap data from following code of this URL bloomberg .

I want to fetch text of the following labels and their corresponding values:

*

1. open.
 2. previous close.
 3. ytd return.
 4. market_cap.
 5. day range.
 6. 52wk_range.
 7. current_per_ratio.
 8. shares_outstanding.
 9. volume.
 10. one_year_return.
 11. Earnings_per_share.
 12. price_sales.
 13. divident_indecated_gross_yeild.

*

I tried it but failed, don't know the correct way to do it with bs4 in python.

Please guide me to achieve it the way i want.

<div class="data-table data-table_detailed"><!-- no spaces --><div class="cell cell__mobile-basic cell__visible__even"> <div class="cell__label"> Open </div> <div
    > class="cell__value cell__value_"> 1,040.40 </div> </div><!-- no spaces
    > --><!-- no spaces --><div class="cell cell__mobile-basic"> <div class="cell__label"> Day Range </div> <div class="cell__value
    > cell__value_"> 1,026.00 - 1,044.00 </div> </div><!-- no spaces --><!--
    > no spaces --><div class="cell cell__mobile-basic cell__visible__even">
    > <div class="cell__label"> Volume </div> <div class="cell__value
    > cell__value_"> 2,580,677 </div> </div><!-- no spaces --><!-- no spaces
    > --><div class="cell cell__mobile-basic"> <div class="cell__label"> Previous Close </div> <div class="cell__value cell__value_"> 1,040.45
    > </div> </div><!-- no spaces --><!-- no spaces --><div class="cell
    > cell__mobile-basic cell__visible__even"> <div class="cell__label">
    > 52Wk Range </div> <div class="cell__value cell__value_"> 900.30 -
    > 1,279.30 </div> </div><!-- no spaces --><!-- no spaces --><div
    > class="cell cell__mobile-basic"> <div class="cell__label"> 1 Yr Return
    > </div> <div class="cell__value cell__value_down"> -12.66% </div>
    > </div><!-- no spaces --><!-- no spaces --><div class="cell
    > cell__mobile-basic cell__visible__even"> <div class="cell__label"> YTD
    > Return </div> <div class="cell__value cell__value_up"> 2.06% </div>
    > </div><!-- no spaces --><!-- no spaces --><div class="cell "> <div
    > class="cell__label"> Current P/E Ratio (TTM) </div> <div
    > class="cell__value cell__value_"> 16.43 </div> </div><!-- no spaces
    > --><!-- no spaces --><div class="cell  cell__visible__even"> <div class="cell__label"> Earnings per Share (INR) (TTM) </div> <div
    > class="cell__value cell__value_"> 62.76 </div> </div><!-- no spaces
    > --><!-- no spaces --><div class="cell "> <div class="cell__label"> Market Cap (t INR) </div>  <div class="cell__value cell__value_">
    > 2.369 </div>  </div><!-- no spaces --><!-- no spaces --> <div class="cell  cell__visible__even">  <div class="cell__label"> Shares
    > Outstanding  (b) </div>  <div class="cell__value cell__value_"> 2.297
    > </div>  </div><!-- no spaces --><!-- no spaces --><div class="cell "> 
    > <div class="cell__label"> Price/Sales (TTM) </div>  <div
    > class="cell__value cell__value_"> 3.47 </div>  </div><!-- no spaces
    > --><!-- no spaces --> <div class="cell  cell__visible__even">  <div class="cell__label"> Dividend Indicated Gross Yield </div>  <div
    > class="cell__value cell__value_"> 2.45% </div>  </div><!-- no spaces
    > --></div>

for cell in soup.find_all("div", class_='data-table data-table_detailed'):
    name = ""
    namecell = cell.find("div", class_="cell__label", text=True)
    if namecell is not None:
         name = namecell.get_text(strip=True)
    price_chage = cell.find("div", class_="cell__value cell__value").get_text(strip=True)
    data+=( "%s: Price Change:  %s," % (name, price_chage))

All the geeks who are passing by and giving negative votes, This is the code and i wrote it. Congratulations to me!, If you can't help any one then don't vote negatively.

for cell in soup.find_all("div", class_='cell ' ):

    namecell = cell.find("div", class_="cell__label", text=True).get_text(strip=True)
    if cell.find("div", class_=("cell__value cell__value_down"),text=True):
        classText="cell__value cell__value_down"
    elif cell.find("div", class_=("cell__value cell__value_up"),text=True):
        classText = "cell__value cell__value_up"
    else:
        classText = "cell__value cell__value_"

    value=cell.find("div", class_=(classText),text=True).get_text(strip=True)
    if   namecell and value is not None:
         tbl_data.append( namecell+":"+value)
print tbl_data

output is:

[u'Open:1,040.40', u'Day Range:1,026.00 - 1,044.00', u'Volume:2,580,677', u'Previous Close:1,040.45', u'52Wk Range:900.30 - 1,279.30', u'1 Yr Return:-12.66%', u'Current P/E Ratio (TTM):16.43', u'Earnings per Share (INR) (TTM):62.76', u'Market Cap (t INR):2.369', u'Shares Outstanding  (b):2.297', u'Price/Sales (TTM):3.47', u'Dividend Indicated Gross Yield:2.45%']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM