简体   繁体   中英

download specific columns of csv using requests.get

I am using requests.get to download a csv file. I only need two columns from this csv file and the rest of the column are useless for me. Currently I am using

r = requests.get(finalurl, verify=False,stream=True)
shutil.copyfileobj(r.raw, csvfile) 

to get the complete csv file.

However, I only want to download two column from the csv file. I can always download the entire content and then take what is necessary.

Just checking if there is a way to get specific column using requests.get Eg: http://chart.finance.yahoo.com/table.csv?s=AAPL&a=7&b=20&c=2016&d=8&e=20&f=2016&g=d&ignore=.csv

I need only date and Adj.close from this csv file.

Couldn't find similar questions, please direct me if similar question was asked earlier.

Thanks

Try pandas , in your situation, pandas is more convenient.

In [2]: import pandas.io.data as web
   ...: aapl = web.DataReader("AAPL", 'yahoo','2016-7-20','2016-8-20')
   ...: aapl['Adj Close']
   ...:
   ...:
Out[2]:
Date
2016-07-20     99.421412
2016-07-21     98.894269
2016-07-22     98.128421
2016-07-25     96.815526
2016-07-26     96.149138
2016-07-27    102.395300
2016-07-28    103.777810
2016-07-29    103.648513
2016-08-01    105.478603
2016-08-02    103.917063
2016-08-03    105.220002
2016-08-04    105.870003
2016-08-05    107.480003
2016-08-08    108.370003
2016-08-09    108.809998
2016-08-10    108.000000
2016-08-11    107.930000
2016-08-12    108.180000
2016-08-15    109.480003
2016-08-16    109.379997
2016-08-17    109.220001
2016-08-18    109.080002
2016-08-19    109.360001
Name: Adj Close, dtype: float64

You could use Numpy and Loadtext:

import numpy as np 
b=np.loadtxt(r'name.csv',dtype=str,delimiter=',',skiprows=1,usecols=(0,1,2))

This creates an array with data for only the columns you choose.

You cannot download certain columns only, you can with the regular finance api. You don't have to download all the data in one go either though and then replace after, you can parse as you go:

import csv

final_url = "http://chart.finance.yahoo.com/table.csv?s=AAPL&a=7&b=20&c=2016&d=8&e=20&f=2016&g=d&ignore=.csv"
with open("out.csv", "w") as out:
    writer = csv.writer(out)
    data = requests.get(final_url, verify=False, stream=True).iter_lines()
    headers = fieldnames = next(data).split(",")
    reader = csv.DictReader(data, fieldnames=headers)
    writer.writerow(["Date", "Adj Close"])
    for row in reader:
        writer.writerow([row["Date"], row["Adj Close"]])

You could just index if the column order is guaranteed to never change but using the DictReader lets you access by key so order is irrelevant. I think it is also safe to presume there will not be any newlines nested in the data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM