Python: which is the best way to read large .csv file?

Question

I have to read large .csv of around 20MB . Those files are tables composed by 8 columns and 5198 rows. I have to do some statistics over a specific column I .

I have n different files and this what I am doing:

stat = np.arange(n)
    I = 0
    for k in stat:
        df = pd.read_csv(pathS+'run_TestRandom_%d.csv'%k, sep=' ')
        I+=df['I']
    I = I/k ## Average

This process takes 0.65s and I wondering if there is a fastest way.

Answer 1

EDIT: Apparently this is a really bad way to do it! Don't do what I did I guess :/

I'm working on a similar problem right now with about the same size dataset. The method I'm using is numpy's genfromtxt

import numpy as np

ary2d = np.genfromtxt('yourfile.csv', delimiter=',', skip_header=1,
    skip_footer=0, names=['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 'col8'])

On my system it times to about .1sec in total

The one problem with this is that any value that is non-numeric is simply replaced by nan which may not be what you want

Python: which is the best way to read large .csv file?

Question

1 answers

solution1
-2 2016-11-30 17:41:24

Python: which is the best way to read large .csv file?

Question

1 answers

solution1 -2 2016-11-30 17:41:24

solution1
-2 2016-11-30 17:41:24