简体   繁体   中英

Scraping headlines from Yahoo Finance using Python

I am using beautiful soup to extract headlines from this page http://in.finance.yahoo.com/q?s=AAPL but I need headlines for past 3 months ie from 10 Dec 2013 to 10 March 2014. But I am able to extract only the headlines that are their on this specific page. How to extract the required headlines for any specific company?

Code:

url = 'http://in.finance.yahoo.com/q?s=AAPL'
data = urllib2.urlopen(url)
soup = BeautifulSoup(data)

divs = soup.find('div',attrs={'id':'yfi_headlines'})
div = divs.find('div',attrs={'class':'bd'})
ul = div.find('ul')
lis = ul.findAll('li')
hls = []
for li in lis:
    headlines = li.find('a').contents[0]
    print headlines

我认为您的问题与从何处获取数据有关,如果您需要最近三个月的数据,则应查询http://in.finance.yahoo.com/q/hp?s=AAPL ,您要查找的所有数据都显示在表格上。

on http://in.finance.yahoo.com/q?s=AAPL , click on 'more headlines from AAPL'. from there you'll get a link that has a datetime field in it. modify that and you should be good. ( http://in.finance.yahoo.com/q/h?s=AAPL&t=2014-02-08T15:06:40+05:30 )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM