简体   繁体   中英

Reading a CSV file to multiple NumPy arrays in Python

I am trying to import a .csv file containing various stock prices into a Python script inside a getData() function but I am having trouble with indexes and can't see how to resolve the problem.

I am new to both CSV and NumPy so am unsure where the problem is exactly, but when I attempt to run this code I receive the following:

File "../StockPlot.py", line 20, in getData date[i-1] = data[0] IndexError: index 0 is out of bounds for axis 0 with size 0

import numpy as np
import matplotlib.pyplot as plt
import csv

def getData():
  date = np.array([])
  openPrice = np.array([])
  closePrice = np.array([])
  volume = np.array([])

  i = 1
  with open('aapl.csv', 'rb') as f:
      reader = csv.reader(open('aapl.csv'))
      data_as_list = list(reader)
      items = len(data_as_list)

      while i < items:
          data = data_as_list[i]
          date[i-1] = data[0]
          openPrice[i-1] = data[1]
          closePrice[i-1] = data[4]
          volume[i-1] = data[5]
          i += 1

  return date, openPrice, closePrice, volume

getData()

The AAPL.csv file I am trying to read has lines taking the form:

Date, Open, High, Low, Close, Volume

26-Jul-17,153.35,153.93,153.06,153.46,15415545

25-Jul-17,151.80,153.84,151.80,152.74,18853932

24-Jul-17,150.58,152.44,149.90,152.09,21493160

I would appreciate any help solving this problem, it seems that the data_as_list is a list of lists of each line, and after playing around with the print function it seems to be printing data[0] etc. inside the while loop but won't allow me to assign the values to the arrays I have created

IMO it's much more convenient to use Pandas for that:

import pandas as pd

fn = r'/path/to/AAPL.csv'    
df = pd.read_csv(fn, skipinitialspace=True, parse_dates=['Date'])

Result:

In [83]: df
Out[83]:
        Date    Open    High     Low   Close    Volume
0 2017-07-26  153.35  153.93  153.06  153.46  15415545
1 2017-07-25  151.80  153.84  151.80  152.74  18853932
2 2017-07-24  150.58  152.44  149.90  152.09  21493160

As numpy 2D array:

In [84]: df.values
Out[84]:
array([[Timestamp('2017-07-26 00:00:00'), 153.35, 153.93, 153.06, 153.46, 15415545],
       [Timestamp('2017-07-25 00:00:00'), 151.8, 153.84, 151.8, 152.74, 18853932],
       [Timestamp('2017-07-24 00:00:00'), 150.58, 152.44, 149.9, 152.09, 21493160]], dtype=object)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM