繁体   English   中英

无法从CSV文件导入数据,遍历代码,从finviz.com抓取数据并导出到CSV文件以进行更多分析

[英]Trouble Inporting data from a csv file, loop through the tickers, scrape data from finviz.com, and export to a csv file for more analysis

我无法将所有内容放在一起。 我想(1)从csv文件中提取股票行情清单。 (2)将所有股票行情清单循环到finviz.com,以获取某些关键数据点。 (3)将所有数据提取到另一个csv文件中以进行更多分析。 这是我到目前为止的代码。

import pandas as pd
from bs4 import BeautifulSoup as bs
import requests
import csv
import time
import datetime
from datetime import datetime as dt

Symbol = []

with open('shortlist.csv') as csvDataFile:
csvReader = csv.reader(csvDataFile)
for row in csvReader:
    Symbol.append(row[0])


def get_fundamental_data(df):
for symbol in df.index:
    try:
        url = 'http://finviz.com/quote.ashx?t=' + symbol.lower()
        soup = bs(requests.get(url).content, features='html5lib')
        for m in df.columns:
            df.loc[symbol, m] = fundamental_metric(soup, m)
    except Exception, e:
        print (symbol, 'not found')
return df


def fundamental_metric(soup, metric):
return soup.find(text=metric).find_next(class_='snapshot-td2').text


metric = [  # 'Inst Own',
        # 'Insider Own',
'Price',
'Shs Outstand',
'Shs Float',
'Short Float',
'Short Ratio',
'Book/sh',
'Cash/sh',
'Rel Volume',
'Earnings',
'Avg Volume',
'Volume',
]
df = pd.DataFrame(index=symbol, columns=metric)
df = get_fundamental_data(df)

print df

df.to_csv('finviz_' + time.strftime('%Y-%m-%d') + '.csv')

附件是我要导入的shortlist.csv: 在此处输入图片描述

我得到的错误是: 在此处输入图像描述

我在Pycharm上使用Python 3。

结果应如下所示: 在此处输入图片描述

您的“符号”在函数get_fundamental_data()定义
您不能在for循环或函数中使用“符号”

这三种方法之一应该使您非常接近想要的位置。

方法1:

import csv
import requests
from bs4 import BeautifulSoup

url_base = "https://finviz.com/quote.ashx?t="
tckr = ['SBUX','MSFT','AAPL']
url_list = [url_base + s for s in tckr]

with open('C:\\Users\\Excel\\Downloads\\SO.csv', 'a', newline='') as f:
    writer = csv.writer(f)

    for url in url_list:
        try:
            fpage = requests.get(url)
            fsoup = BeautifulSoup(fpage.content, 'html.parser')

            # write header row
            writer.writerow(map(lambda e : e.text, fsoup.find_all('td', {'class':'snapshot-td2-cp'})))

            # write body row
            writer.writerow(map(lambda e : e.text, fsoup.find_all('td', {'class':'snapshot-td2'})))            
        except HTTPError:
            print("{} - not found".format(url))

方法2:

import requests
from bs4 import BeautifulSoup

base_url = 'http://finviz.com/screener.ashx?v=152&s=ta_topgainers&o=price&c=0,1,2,3,4,5,6,7,25,63,64,65,66,67'
html = requests.get(base_url)
soup = BeautifulSoup(html.content, "html.parser")
main_div = soup.find('div', attrs = {'id':'screener-content'})

light_rows = main_div.find_all('tr', class_="table-light-row-cp")
dark_rows = main_div.find_all('tr', class_="table-dark-row-cp")

data = []
for rows_set in (light_rows, dark_rows):
    for row in rows_set:
        row_data = []
        for cell in row.find_all('td'):
            val = cell.a.get_text()
            row_data.append(val)
        data.append(row_data)

#   sort rows to maintain original order
data.sort(key=lambda x: int(x[0]))

import pandas
pandas.DataFrame(data).to_csv("AAA.csv", header=False)

方法3:

import csv
import requests
from bs4 import BeautifulSoup

url_base = "https://finviz.com/quote.ashx?t="
tckr = ['SBUX','MSFT','AAPL']
url_list = [url_base + s for s in tckr]

with open('C:/Users/Excel/Desktop/today.csv', 'a', newline='') as f:
    writer = csv.writer(f)

    for url in url_list:
        try:
            fpage = requests.get(url)
            fsoup = BeautifulSoup(fpage.content, 'html.parser')

            # write header row
            writer.writerow(map(lambda e : e.text, fsoup.find_all('td', {'class':'snapshot-td2-cp'})))

            # write body row
            writer.writerow(map(lambda e : e.text, fsoup.find_all('td', {'class':'snapshot-td2'})))            
        except HTTPError:
            print("{} - not found".format(url))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM