简体   繁体   中英

is there any better way for reading files?

Every time when i am reading CSv file as list by using this long method, can we simplify this?

  1. Creating empty List
  2. Reading file row-wise and appending to the list
filename = 'mtms_excelExtraction_m_Model_Definition.csv'
Ana_Type = []
Ana_Length = []
Ana_Text = []
Ana_Space = []                                                                                                                                                                                                                                                                     
with open(filename, 'rt') as f:  
    reader = csv.reader(f)   
    try:
        for row in reader:
            Ana_Type.append(row[0])
            Ana_Length.append(row[1])
            Ana_Text.append(row[2])
            Ana_Space.append(row[3])            
    except csv.Error as e:
        sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))

This is a good opportunity for you to start using pandas and working with DataFrames.

import pandas as pd

df = pd.read_csv(path_to_csv)

1-2 (depending on if you count the import) lines of code and you're done!

This one is essentially the numpy way of processing the csv file, without using numpy. Whether it is better than your original method is close to a matter of taste. It has in common with the numpy or Pandas method the fact of loading the whole file in memory and than transposing it into lists:

with open(filename, 'rt') as f:  
    reader = csv.reader(f)   
    tmp = list(reader)
Ana_Type, Ana_Length, Ana_Text, Ana_Space = [[tmp[i][j] for i in range(len(tmp))]
                                             for j in range(len(tmp[0]))]

It uses less code, and build arrays with comprehensions instead of repeated appends, but more memory (as would numpy or pandas).

Depending on how you later process the data, numpy or Pandas could be a nice option. Because IMHO using them only to load a csv file into list is not worth it.

You can use a DictReader

import csv

with open(filename, 'rt') as f:  
    data = list(csv.DictReader(f, fieldnames=["Type", "Length", "Text", "Space"]))

print(data)

This will give you a single list of dict objects, one per row.

This could be useful:

import numpy as np
# read the rows with Numpy
rows = np.genfromtxt('data.csv',dtype='str',delimiter=';')
# call numpy.transpose to convert the rows to columns
cols = np.transpose(rows)

# get the stuff as lists
Ana_Type = list(cols[0])
Ana_Length = list(cols[1])
Ana_Text = list(cols[2])
Ana_Space = list(cols[0]) 

Edit: note that the first element will be the name of the columns (example with test data):

['Date', '2020-03-03', '2020-03-04', '2020-03-05', '2020-03-06']

Try this

import csv
from collections import defaultdict
d = defaultdict(list)
with open(filename, mode='r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    for row in csv_reader:
        for k,v in row.items():
            d[k].append(v)

next

d.keys()
dict_keys(['Ana_Type', 'Ana_Length', 'Ana_Text', 'Ana_Space'])

next

d.get('Ana_Type')
['bla','bla1','df','ccc']

The repetitive calls to list.append can be avoided by reading the csv and using the zip builtin function to transpose the rows.

import io, csv

# Create an example file
buf = io.StringIO('type1,length1,text1,space1\ntype2,length2,text2,space2\ntype3,length3,text3,space3')

reader = csv.reader(buf)
# Uncomment the next line if there is a header row
# next(reader)

Ana_Types, Ana_Length, Ana_Text, Ana_Space = zip(*reader)

print(Ana_Types)
('type1', 'type2', 'type3')
print(Ana_Length)
('length1', 'length2', 'length3')
...

If you need lists rather than tuples you can use a list or generator comprehension to convert them:

Ana_Types, Ana_Length, Ana_Text, Ana_Space = [list(x) for x in zip(*reader)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM