简体   繁体   中英

Python: which is the best way to read large .csv file?

I have to read large .csv of around 20MB . Those files are tables composed by 8 columns and 5198 rows. I have to do some statistics over a specific column I .

I have n different files and this what I am doing:

stat = np.arange(n)
    I = 0
    for k in stat:
        df = pd.read_csv(pathS+'run_TestRandom_%d.csv'%k, sep=' ')
        I+=df['I']
    I = I/k ## Average

This process takes 0.65s and I wondering if there is a fastest way.

EDIT: Apparently this is a really bad way to do it! Don't do what I did I guess :/

I'm working on a similar problem right now with about the same size dataset. The method I'm using is numpy's genfromtxt

import numpy as np

ary2d = np.genfromtxt('yourfile.csv', delimiter=',', skip_header=1,
    skip_footer=0, names=['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 'col8'])

On my system it times to about .1sec in total

The one problem with this is that any value that is non-numeric is simply replaced by nan which may not be what you want

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM