简体   繁体   English

某公司股票数据挖掘

[英]Stock data mining for a company

new to python and have a problem to solve that I've hit a roadblock.我是 python 的新手,有一个问题要解决,我遇到了一个障碍。 Looking to calculate the monthly and yearly average price for Google and tell us the best and worst six months and the best and worst six years for Google from 2004 to Oct 2019 The average price is defined as ((v1*c1)+(v2*c2)+(v3*c3)+(v4*c4)...+(vn*cn)) / (v1+v2+v3+v4...+vn) where vi is the volume for day i and ci is the adjusted close price for day i.希望计算谷歌的月度和年度平均价格,并告诉我们从 2004 年到 2019 年 10 月谷歌最好和最差的六个月以及最好和最差的六年平均价格定义为 ((v1*c1)+(v2*) c2)+(v3*c3)+(v4*c4)...+(vn*cn)) / (v1+v2+v3+v4...+vn) 其中 vi 是第 i 天的体积,ci 是第 i 天调整后的收盘价。

I've been able to import the data from the web but now I'm trying to at least get one year into a tuple and then work out from that but can't figure out how to convert the data I have into a tuple.我已经能够从网络导入数据,但现在我试图至少将一年放入一个元组,然后从中计算出来,但无法弄清楚如何将我拥有的数据转换为元组。 I try function = tuple but get errors.我尝试 function = tuple 但出现错误。 I know how to make a tuple but not how to make one from the data I have or if I'm missing a step to get the data into one or if I'm going off track with this approach我知道如何创建一个元组,但不知道如何从我拥有的数据中创建一个元组,或者我是否缺少将数据整合到一个的步骤,或者我是否使用这种方法偏离了轨道

Here is my code so far if anyone could point me in the right direction it would be appreciated到目前为止,这是我的代码,如果有人能指出我正确的方向,将不胜感激

# url = "http://193.1.33.31:88/pa1/GOOGL.csv"


import csv

import begin
from read_from_file_or_net import get_stuff_from_net as gn

def main(csv_file: 'URL of CSV file'):
# def main(csv_file):

    try:
        print(f"{csv_file}")
        my_file = gn(csv_file)
        # with open(f".cache/{my_file}", "w") as output:
        #     output.write(my_file)

        my_file = my_file.split("\n")

        for row in my_file:
            row = row.strip().split(",")
            for cell in row:
                if cell.isalpha():
                    print(f"{cell}"+"", end="")
                elif "-" in cell:
                    print(f"{cell}", end="")
                elif "." in cell:
                    print(f"{float(cell):>10.2f}", end="")
                elif cell.isnumeric():
                    print(f"{int(cell):>15d}", end="")
                elif not cell.isspace():
                    print(f"{cell}", end="  ")
                #elif cell.istitle():
                   # print(f"{cell}", end="")
                #else:
                    #print("?", end="")
            print()

    except Exception as e:
        print(f"{e}")


if __name__ == "__main__":
    main("http://193.1.33.31:88/pa1/GOOGL.csv")

You should use pandas for this.您应该为此使用pandas It has many powerful functions which doesn't need for -loop.它有许多强大的功能,不需要for -loop。

You can read csv directly from web page您可以直接从网页读取csv

import pandas as pd

df = pd.read_csv('http://193.1.33.31:88/pa1/GOOGL.csv')

print(df.columns)
print(df.head())

You can select one year - ie.您可以选择一年 - 即。 2018 2018年

year2018 = df[ (df['Date'] >= '2018-01-01') & (df['Date'] < '2019-01-01') ]

And calculate your value并计算你的价值

result = (year2018['Volume'] * year2018['Adj Close']).sum() / year2018['Volume'].sum()

print(result)

EDIT: Silimar for other years编辑:其他年份的 Silimar

for year in range(2004, 2019):
    year = str(year)
    data = df[ df['Date'].str.startswith(year) ]
    result = (data['Volume'] * data['Adj Close']).sum() / data['Volume'].sum()

    print(year, result)

Result:结果:

2004 80.44437157567273
2005 137.4076040074354
2006 203.03824165240846
2007 273.04059204266287
2008 227.86912213843564
2009 206.71221450434697
2010 268.65533171697064
2011 283.70689930771306
2012 322.70466840310667
2013 437.32701278816154
2014 567.9540540371448
2015 623.3613056057101
2016 757.9295821975054
2017 940.267270383813
2018 1115.287148437416

EDIT: If you keep results on list as tuples (result, year) then you can sort them to get the best and the worst years编辑:如果您将结果作为元组(result, year)保留在列表中(result, year)那么您可以对它们进行排序以获得最佳和最差年份

import pandas as pd

df = pd.read_csv('http://193.1.33.31:88/pa1/GOOGL.csv')
#df['Date'] = pandas.to_datetime(df['Date'])

#print(df.columns)

year2018 = df[ (df['Date'] >= '2018-01-01') & (df['Date'] < '2019-01-01') ]

result = (year2018['Volume'] * year2018['Adj Close']).sum() / year2018['Volume'].sum()

#print(result)

all_results = []
for year in range(2004, 2019):
    year = str(year)
    data = df[ df['Date'].str.startswith(year) ]
    result = (data['Volume'] * data['Adj Close']).sum() / data['Volume'].sum()

    all_results.append( (result, year) )
    #print(year, result)

print('--- sorted by result ---')

sorted_results = sorted(all_results)

for result, year in sorted_results:
    print(year, result)

Result:结果:

--- sorted by result ---
2004 80.44437157567273
2005 137.4076040074354
2006 203.03824165240846
2009 206.71221450434697
2008 227.86912213843564
2010 268.65533171697064
2007 273.04059204266287
2011 283.70689930771306
2012 322.70466840310667
2013 437.32701278816154
2014 567.9540540371448
2015 623.3613056057101
2016 757.9295821975054
2017 940.267270383813
2018 1115.287148437416

Using slice sorted_results[:6] you can get six the worst years, using sorted_results[-6:] you can get six the best years.使用 slice sorted_results[:6]可以获得六个最坏的年份,使用sorted_results[-6:]可以获得六个最好的年份。 You can also use reversed() if you want in different order.如果您想要不同的顺序,您也可以使用reversed()


EDIT: Almost all the same without pandas编辑:没有pandas几乎都是一样的

import requests
import csv

def main(url):
    r = requests.get(url)

    lines = r.text.split('\n')

    headers = lines[0].split(',')

    data = []

    for line in lines[1:]:
        line = line.strip()
        if line: # skip empty lines
            row = line.strip().split(',')

            # convert string to float/int
            row[1] = float(row[1])
            row[2] = float(row[2])
            row[3] = float(row[3])
            row[4] = float(row[4])
            row[5] = float(row[5])
            row[6] = int(row[6])

            data.append(row)

    return headers, data


if __name__ == "__main__":    
    headers, data = main('http://193.1.33.31:88/pa1/GOOGL.csv')

    print(headers)

    print('--- data ---')
    print(data[0])
    print(data[-1])

    # get only year 2018

    year2018 = []
    for row in data:
        if '2018-01-01' <= row[0] < '2019-01-01':
           year2018.append(row)

    print('--- year 2018 ---')
    print(year2018[0])
    print(year2018[-1])

    # your calculation

    a = 0
    b = 0
    for row in year2018:
        a += row[5] * row[6]
        b += row[6]

    result = a/b

    print(result)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM