簡體   English   中英

使用python和bs4進行網頁爬取

[英]Web Scraping using python and bs4

我正在嘗試從此URL Bloomberg的以下代碼中抓取數據

我想獲取以下標簽及其對應值的文本:

*

1. open.
 2. previous close.
 3. ytd return.
 4. market_cap.
 5. day range.
 6. 52wk_range.
 7. current_per_ratio.
 8. shares_outstanding.
 9. volume.
 10. one_year_return.
 11. Earnings_per_share.
 12. price_sales.
 13. divident_indecated_gross_yeild.

*

我嘗試過,但是失敗了,不知道在python中使用bs4的正確方法。

請指導我以我想要的方式實現它。

<div class="data-table data-table_detailed"><!-- no spaces --><div class="cell cell__mobile-basic cell__visible__even"> <div class="cell__label"> Open </div> <div
    > class="cell__value cell__value_"> 1,040.40 </div> </div><!-- no spaces
    > --><!-- no spaces --><div class="cell cell__mobile-basic"> <div class="cell__label"> Day Range </div> <div class="cell__value
    > cell__value_"> 1,026.00 - 1,044.00 </div> </div><!-- no spaces --><!--
    > no spaces --><div class="cell cell__mobile-basic cell__visible__even">
    > <div class="cell__label"> Volume </div> <div class="cell__value
    > cell__value_"> 2,580,677 </div> </div><!-- no spaces --><!-- no spaces
    > --><div class="cell cell__mobile-basic"> <div class="cell__label"> Previous Close </div> <div class="cell__value cell__value_"> 1,040.45
    > </div> </div><!-- no spaces --><!-- no spaces --><div class="cell
    > cell__mobile-basic cell__visible__even"> <div class="cell__label">
    > 52Wk Range </div> <div class="cell__value cell__value_"> 900.30 -
    > 1,279.30 </div> </div><!-- no spaces --><!-- no spaces --><div
    > class="cell cell__mobile-basic"> <div class="cell__label"> 1 Yr Return
    > </div> <div class="cell__value cell__value_down"> -12.66% </div>
    > </div><!-- no spaces --><!-- no spaces --><div class="cell
    > cell__mobile-basic cell__visible__even"> <div class="cell__label"> YTD
    > Return </div> <div class="cell__value cell__value_up"> 2.06% </div>
    > </div><!-- no spaces --><!-- no spaces --><div class="cell "> <div
    > class="cell__label"> Current P/E Ratio (TTM) </div> <div
    > class="cell__value cell__value_"> 16.43 </div> </div><!-- no spaces
    > --><!-- no spaces --><div class="cell  cell__visible__even"> <div class="cell__label"> Earnings per Share (INR) (TTM) </div> <div
    > class="cell__value cell__value_"> 62.76 </div> </div><!-- no spaces
    > --><!-- no spaces --><div class="cell "> <div class="cell__label"> Market Cap (t INR) </div>  <div class="cell__value cell__value_">
    > 2.369 </div>  </div><!-- no spaces --><!-- no spaces --> <div class="cell  cell__visible__even">  <div class="cell__label"> Shares
    > Outstanding  (b) </div>  <div class="cell__value cell__value_"> 2.297
    > </div>  </div><!-- no spaces --><!-- no spaces --><div class="cell "> 
    > <div class="cell__label"> Price/Sales (TTM) </div>  <div
    > class="cell__value cell__value_"> 3.47 </div>  </div><!-- no spaces
    > --><!-- no spaces --> <div class="cell  cell__visible__even">  <div class="cell__label"> Dividend Indicated Gross Yield </div>  <div
    > class="cell__value cell__value_"> 2.45% </div>  </div><!-- no spaces
    > --></div>

for cell in soup.find_all("div", class_='data-table data-table_detailed'):
    name = ""
    namecell = cell.find("div", class_="cell__label", text=True)
    if namecell is not None:
         name = namecell.get_text(strip=True)
    price_chage = cell.find("div", class_="cell__value cell__value").get_text(strip=True)
    data+=( "%s: Price Change:  %s," % (name, price_chage))

所有通過並給予否定票的怪胎,這是代碼,我寫下了。 祝賀我!,如果您不能幫助任何人,請不要投反對票。

for cell in soup.find_all("div", class_='cell ' ):

    namecell = cell.find("div", class_="cell__label", text=True).get_text(strip=True)
    if cell.find("div", class_=("cell__value cell__value_down"),text=True):
        classText="cell__value cell__value_down"
    elif cell.find("div", class_=("cell__value cell__value_up"),text=True):
        classText = "cell__value cell__value_up"
    else:
        classText = "cell__value cell__value_"

    value=cell.find("div", class_=(classText),text=True).get_text(strip=True)
    if   namecell and value is not None:
         tbl_data.append( namecell+":"+value)
print tbl_data

輸出為:

[u'Open:1,040.40', u'Day Range:1,026.00 - 1,044.00', u'Volume:2,580,677', u'Previous Close:1,040.45', u'52Wk Range:900.30 - 1,279.30', u'1 Yr Return:-12.66%', u'Current P/E Ratio (TTM):16.43', u'Earnings per Share (INR) (TTM):62.76', u'Market Cap (t INR):2.369', u'Shares Outstanding  (b):2.297', u'Price/Sales (TTM):3.47', u'Dividend Indicated Gross Yield:2.45%']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM