简体   繁体   中英

What is the best way to read the ith column of a csv file with Python?

I am used to R which offers quick functions to read CSV files column by column, can anyone propose a quick and efficient way to read large data (CSV for example) files in python? the i th column of a CSV file for example.

I have the following but it takes time :

    import os,csv, numpy, scipy
    from numpy import *
    f= open('some.csv', 'rb') 
    reader = csv.reader(f, delimiter=',')
    header = reader.next()
    zipped = zip(*reader)
    print( zipped[0] ) # is the first column

Is there a better way to read data (from large files) in python (at least as quick as R in terms of memory) ?

You can also use pandas.read_csv and its use_cols argument. See here

import pandas as pd

data = pd.read_csv('some.csv', use_cols = ['col_1', 'col_2', 'col_4'])
...
import csv

with open('some.csv') as fin:
    reader = csv.reader(fin)
    first_col = [row[0] for row in reader]

What you're doing using zip is loading the entire file to memory, then transposing it to get the col. If you only want the column values, just include that in the list to start with.

If you wanted multiple columns, then you could do:

from operator import itemgetter
get_cols = itemgetter(1, 3, 5)
cols = map(get_cols, reader)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM