使用python从HTML网站上进行网页抓取股票

Question

I'm trying to scrape stock tickers from a website with a page source that looks like this: 我正在尝试从具有以下页面来源的网站上抓取股票报价器：

<thead>
                            <tr>
                                <th>Company</th>
                                 <th>Symbol</th>
                                 <th>Weight</th>
                        </tr>
                    </thead>


                    <tbody>

                        <tr>
                            <td><a href="http://www.google.com/finance?q=AAPL">Apple Inc.</a></td>
                            <td><form action="/charts" method="post"> <div><input type="hidden" name="symbol" value="AAPL"/> <input type="submit" value="AAPL"/> </div></form></td>
                            <td>3.635302</td>
                        </tr>

So far , my python code (below) is only returning the name of the company ("Apple Inc.") , and the weight of 3.635 into the csv file - but I'd like to include the ticker 'AAPL' . 到目前为止，我的python代码（如下）仅返回公司名称（“ Apple Inc.”），并将3.635的权重返回到csv文件中-但我想添加代码'AAPL'。 On the website the tickers are formatted as a hyperlink- not sure how to scrape that data. 在网站上，股票代码的格式为超链接-不确定如何抓取该数据。

url = "http://slickcharts.com/sp500"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data, "html5lib")

table=soup.find_all('table')[0]
rows=table.find_all('tr')[1:]

data = {
    'Company' : [],
    'Symbol' : [],
    'Weight' : []
}

for row in rows:
    cols = row.find_all('td')
    data['Company'].append(cols[0].get_text())
    data['Symbol'].append(cols[1].get_text())
    data['Weight'].append(cols[2].get_text())

Answer 1

There is nothing in your cols[1].get_text() 您的cols[1].get_text()没有任何内容cols[1].get_text()

you need data['Symbol'].append(cols[1].find('input')['value']) 您需要data['Symbol'].append(cols[1].find('input')['value'])

Answer 2

You can get the ticker by finding the <a> tag and getting the href attribute as shown below and then splitting the link based on = would give a list with second value as the required AAPL 您可以通过找到<a>标记并获取href属性（如下所示），然后基于=拆分链接，以获得带有第二个值的列表作为必需的AAPL

url = "http://slickcharts.com/sp500"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data, "html5lib")

table=soup.find_all('table')[0]
rows=table.find_all('tr')[1:]

data = {
    'Company' : [],
    'Symbol' : [],
    'Weight' : [],
    'q':[]
}
for row in rows:
    cols = row.find_all('td')
    data['Company'].append(cols[0].get_text())
    data['Symbol'].append(cols[1].get_text())
    data['Weight'].append(cols[2].get_text())
    data['q'].append(cols[0].find("a").get("href").split("=")[1])

使用python从HTML网站上进行网页抓取股票

问题描述

2 个解决方案

解决方案1
0 已采纳 2017-07-12 06:00:53

解决方案2
0 2017-07-12 06:03:21

使用python从HTML网站上进行网页抓取股票

问题描述

2 个解决方案

解决方案1 0 已采纳 2017-07-12 06:00:53

解决方案2 0 2017-07-12 06:03:21

解决方案1
0 已采纳 2017-07-12 06:00:53

解决方案2
0 2017-07-12 06:03:21