I am used to R which offers quick functions to read CSV files column by column, can anyone propose a quick and efficient way to read large data (CSV for example) files in python? the i th column of a CSV file for example.
I have the following but it takes time :
import os,csv, numpy, scipy
from numpy import *
f= open('some.csv', 'rb')
reader = csv.reader(f, delimiter=',')
header = reader.next()
zipped = zip(*reader)
print( zipped[0] ) # is the first column
Is there a better way to read data (from large files) in python (at least as quick as R in terms of memory) ?
You can also use pandas.read_csv
and its use_cols
argument. See here
import pandas as pd
data = pd.read_csv('some.csv', use_cols = ['col_1', 'col_2', 'col_4'])
...
import csv
with open('some.csv') as fin:
reader = csv.reader(fin)
first_col = [row[0] for row in reader]
What you're doing using zip
is loading the entire file to memory, then transposing it to get the col. If you only want the column values, just include that in the list to start with.
If you wanted multiple columns, then you could do:
from operator import itemgetter
get_cols = itemgetter(1, 3, 5)
cols = map(get_cols, reader)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.