[英]Web scraping stocks from an HTML website using python
I'm trying to scrape stock tickers from a website with a page source that looks like this: 我正在尝试从具有以下页面来源的网站上抓取股票报价器:
<thead>
<tr>
<th>Company</th>
<th>Symbol</th>
<th>Weight</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="http://www.google.com/finance?q=AAPL">Apple Inc.</a></td>
<td><form action="/charts" method="post"> <div><input type="hidden" name="symbol" value="AAPL"/> <input type="submit" value="AAPL"/> </div></form></td>
<td>3.635302</td>
</tr>
So far , my python code (below) is only returning the name of the company ("Apple Inc.") , and the weight of 3.635 into the csv file - but I'd like to include the ticker 'AAPL' . 到目前为止,我的python代码(如下)仅返回公司名称(“ Apple Inc.”),并将3.635的权重返回到csv文件中-但我想添加代码'AAPL'。 On the website the tickers are formatted as a hyperlink- not sure how to scrape that data.
在网站上,股票代码的格式为超链接-不确定如何抓取该数据。
url = "http://slickcharts.com/sp500"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data, "html5lib")
table=soup.find_all('table')[0]
rows=table.find_all('tr')[1:]
data = {
'Company' : [],
'Symbol' : [],
'Weight' : []
}
for row in rows:
cols = row.find_all('td')
data['Company'].append(cols[0].get_text())
data['Symbol'].append(cols[1].get_text())
data['Weight'].append(cols[2].get_text())
There is nothing in your cols[1].get_text()
您的
cols[1].get_text()
没有任何内容cols[1].get_text()
you need data['Symbol'].append(cols[1].find('input')['value'])
您需要
data['Symbol'].append(cols[1].find('input')['value'])
You can get the ticker by finding the <a>
tag and getting the href
attribute as shown below and then splitting the link based on =
would give a list with second value as the required AAPL
您可以通过找到
<a>
标记并获取href
属性(如下所示),然后基于=
拆分链接,以获得带有第二个值的列表作为必需的AAPL
url = "http://slickcharts.com/sp500"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data, "html5lib")
table=soup.find_all('table')[0]
rows=table.find_all('tr')[1:]
data = {
'Company' : [],
'Symbol' : [],
'Weight' : [],
'q':[]
}
for row in rows:
cols = row.find_all('td')
data['Company'].append(cols[0].get_text())
data['Symbol'].append(cols[1].get_text())
data['Weight'].append(cols[2].get_text())
data['q'].append(cols[0].find("a").get("href").split("=")[1])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.