简体   繁体   English

使用python从HTML网站上进行网页抓取股票

[英]Web scraping stocks from an HTML website using python

I'm trying to scrape stock tickers from a website with a page source that looks like this: 我正在尝试从具有以下页面来源的网站上抓取股票报价器:

<thead>
                            <tr>
                                <th>Company</th>
                                 <th>Symbol</th>
                                 <th>Weight</th>
                        </tr>
                    </thead>


                    <tbody>

                        <tr>
                            <td><a href="http://www.google.com/finance?q=AAPL">Apple Inc.</a></td>
                            <td><form action="/charts" method="post"> <div><input type="hidden" name="symbol" value="AAPL"/> <input type="submit" value="AAPL"/> </div></form></td>
                            <td>3.635302</td>
                        </tr>

So far , my python code (below) is only returning the name of the company ("Apple Inc.") , and the weight of 3.635 into the csv file - but I'd like to include the ticker 'AAPL' . 到目前为止,我的python代码(如下)仅返回公司名称(“ Apple Inc.”),并将3.635的权重返回到csv文件中-但我想添加代码'AAPL'。 On the website the tickers are formatted as a hyperlink- not sure how to scrape that data. 在网站上,股票代码的格式为超链接-不确定如何抓取该数据。

url = "http://slickcharts.com/sp500"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data, "html5lib")

table=soup.find_all('table')[0]
rows=table.find_all('tr')[1:]

data = {
    'Company' : [],
    'Symbol' : [],
    'Weight' : []
}

for row in rows:
    cols = row.find_all('td')
    data['Company'].append(cols[0].get_text())
    data['Symbol'].append(cols[1].get_text())
    data['Weight'].append(cols[2].get_text())

There is nothing in your cols[1].get_text() 您的cols[1].get_text()没有任何内容cols[1].get_text()

you need data['Symbol'].append(cols[1].find('input')['value']) 您需要data['Symbol'].append(cols[1].find('input')['value'])

You can get the ticker by finding the <a> tag and getting the href attribute as shown below and then splitting the link based on = would give a list with second value as the required AAPL 您可以通过找到<a>标记并获取href属性(如下所示),然后基于=拆分链接,以获得带有第二个值的列表作为必需的AAPL

url = "http://slickcharts.com/sp500"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data, "html5lib")

table=soup.find_all('table')[0]
rows=table.find_all('tr')[1:]

data = {
    'Company' : [],
    'Symbol' : [],
    'Weight' : [],
    'q':[]
}
for row in rows:
    cols = row.find_all('td')
    data['Company'].append(cols[0].get_text())
    data['Symbol'].append(cols[1].get_text())
    data['Weight'].append(cols[2].get_text())
    data['q'].append(cols[0].find("a").get("href").split("=")[1])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM