简体   繁体   English

有没有更好的读取文件的方法?

[英]is there any better way for reading files?

Every time when i am reading CSv file as list by using this long method, can we simplify this?每次我使用这种长方法将 CSv 文件作为列表读取时,我们可以简化吗?

  1. Creating empty List创建空列表
  2. Reading file row-wise and appending to the list逐行读取文件并附加到列表中
filename = 'mtms_excelExtraction_m_Model_Definition.csv'
Ana_Type = []
Ana_Length = []
Ana_Text = []
Ana_Space = []                                                                                                                                                                                                                                                                     
with open(filename, 'rt') as f:  
    reader = csv.reader(f)   
    try:
        for row in reader:
            Ana_Type.append(row[0])
            Ana_Length.append(row[1])
            Ana_Text.append(row[2])
            Ana_Space.append(row[3])            
    except csv.Error as e:
        sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))

This is a good opportunity for you to start using pandas and working with DataFrames.这是您开始使用pandas和使用 DataFrames 的好机会。

import pandas as pd

df = pd.read_csv(path_to_csv)

1-2 (depending on if you count the import) lines of code and you're done! 1-2 行(取决于您是否计算导入)代码行,您就完成了!

This one is essentially the numpy way of processing the csv file, without using numpy.这个本质上是numpy处理csv文件的方式,不使用numpy。 Whether it is better than your original method is close to a matter of taste.它是否比你原来的方法更好,这接近于一个品味问题。 It has in common with the numpy or Pandas method the fact of loading the whole file in memory and than transposing it into lists:它与 numpy 或 Pandas 方法的共同之处在于将整个文件加载到 memory 中,而不是将其转换为列表:

with open(filename, 'rt') as f:  
    reader = csv.reader(f)   
    tmp = list(reader)
Ana_Type, Ana_Length, Ana_Text, Ana_Space = [[tmp[i][j] for i in range(len(tmp))]
                                             for j in range(len(tmp[0]))]

It uses less code, and build arrays with comprehensions instead of repeated appends, but more memory (as would numpy or pandas).它使用更少的代码,并使用推导式而不是重复附加来构建 arrays,但更多的是 memory(numpy 或 pandas 也是如此)。

Depending on how you later process the data, numpy or Pandas could be a nice option.根据您以后如何处理数据,numpy 或 Pandas 可能是一个不错的选择。 Because IMHO using them only to load a csv file into list is not worth it.因为恕我直言,仅使用它们将 csv 文件加载到列表中是不值得的。

You can use a DictReader您可以使用DictReader

import csv

with open(filename, 'rt') as f:  
    data = list(csv.DictReader(f, fieldnames=["Type", "Length", "Text", "Space"]))

print(data)

This will give you a single list of dict objects, one per row.这将为您提供一个dict对象list ,每行一个。

This could be useful:这可能很有用:

import numpy as np
# read the rows with Numpy
rows = np.genfromtxt('data.csv',dtype='str',delimiter=';')
# call numpy.transpose to convert the rows to columns
cols = np.transpose(rows)

# get the stuff as lists
Ana_Type = list(cols[0])
Ana_Length = list(cols[1])
Ana_Text = list(cols[2])
Ana_Space = list(cols[0]) 

Edit: note that the first element will be the name of the columns (example with test data):编辑:请注意,第一个元素将是列的名称(带有测试数据的示例):

['Date', '2020-03-03', '2020-03-04', '2020-03-05', '2020-03-06']

Try this尝试这个

import csv
from collections import defaultdict
d = defaultdict(list)
with open(filename, mode='r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    for row in csv_reader:
        for k,v in row.items():
            d[k].append(v)

next下一个

d.keys()
dict_keys(['Ana_Type', 'Ana_Length', 'Ana_Text', 'Ana_Space'])

next下一个

d.get('Ana_Type')
['bla','bla1','df','ccc']

The repetitive calls to list.append can be avoided by reading the csv and using the zip builtin function to transpose the rows.重复调用list.append可以通过读取 csv 并使用zip内置 ZC1C4252678E683894D1AB45 转置 C17 行来避免。

import io, csv

# Create an example file
buf = io.StringIO('type1,length1,text1,space1\ntype2,length2,text2,space2\ntype3,length3,text3,space3')

reader = csv.reader(buf)
# Uncomment the next line if there is a header row
# next(reader)

Ana_Types, Ana_Length, Ana_Text, Ana_Space = zip(*reader)

print(Ana_Types)
('type1', 'type2', 'type3')
print(Ana_Length)
('length1', 'length2', 'length3')
...

If you need lists rather than tuples you can use a list or generator comprehension to convert them:如果您需要列表而不是元组,您可以使用列表或生成器推导来转换它们:

Ana_Types, Ana_Length, Ana_Text, Ana_Space = [list(x) for x in zip(*reader)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM