简体   繁体   English

逐行读取python中n个文件

[英]Simultaneous line-by-line reading of n-number of files in python

I have an unknown number (it may and will change over time) of measurement data CSV files in folder on which I would like to perform statistics on. 我要在其上进行统计的文件夹中的测量数据CSV文件数量未知(可能会随着时间变化)。 CSVs have 5 columns of data in all of them. CSV共有5列数据。 I want to be able to do statistical analysis on each line separately (average over multiple measurements, stdev, etc). 我希望能够分别对每条线进行统计分析(多次测量,stdev等的平均值)。 ATM I've gotten so far as list files in folder, stash them into the list and try to open files from list. 到目前为止,我已经使用ATM列出了文件夹中的文件,将它们存储在列表中,然后尝试从列表中打开文件。 It gets very confusing when trying to iterate over lines over files. 当试图遍历文件上的行时,它变得非常混乱。 Right now I was just trying to append contents to the list and output them into other file. 现在,我只是试图将内容追加到列表中,然后将其输出到其他文件中。 No luck. 没运气。 Code may not be very clean, I'm a beginner in programming, but here we go: 代码可能不是很干净,我是编程的初学者,但是现在我们开始吧:

import re
import os

lines_to_skip = 25
workingdir = os.path.dirname(os.path.realpath(__file__))
file_list = []
templine = []
lineNo = 0

print ("Working in %s" %workingdir)
os.chdir(workingdir)
for file in os.listdir(workingdir):
        if file.endswith('.csv'):
                #list only file name without extension (to be able to use filename as variable later)
                file_list.append(file[0:-4])
#open all files in the folder
print (file_list)
for i, value in enumerate(file_list):
    exec "%s = open (file_list[i] + '.csv', 'r')" % (value)

#open output stats file
fileout = open ('zoutput.csv', 'w')

#assuming that all files are of equal length (as they should be)
exec "for x in len(%s + '.csv'):" % (file_list[0])
for i in xrange(lines_to_skip):
        exec "%s.next()" % (file_list[0])
        for j, value in enumerate(file_list):
                templine[:]=[]
                #exec "filename%s=value" % (j)
                exec "line = %s.readline(x)" % (value)
                templine.extend(line)
        fileout.write(templine)

fileout.close()
#close all files in the folder
for i, value in enumerate(file_list):
    #exec "filename%s=value" % (i)
    exec "%s.close()" % (value)

Any suggestions how I could do it other way or improve existing approach? 有什么建议我可以通过其他方式做到或改进现有方法吗? First 25 lines are just info fields, which for my purpose are useless. 前25行只是信息字段,对我来说这没用。 I could just remove first 25 lines from each file separately (instead of trying to skip them), but I guess it doesn't matter much. 我可以分别从每个文件中删除前25行(而不是尝试跳过它们),但是我想这没什么大不了的。 Please don't recommend to use spreadsheets or other statistical software - none of them I've tried so far is able to chew amounts of data I have. 请不要建议使用电子表格或其他统计软件-到目前为止,我尝试过的所有软件都无法咀嚼我拥有的大量数据。 Thanks 谢谢

If I understand your question correctly you want to paste the columns of each file onto one another and, from N files, with C columns and R rows, you want to process one row at a time, where each row has N*C columns? 如果我正确理解了您的问题,那么您想将每个文件的列彼此粘贴,并从N个文件中将C列和R行粘贴到一起,您想一次处理一行,其中每一行都有N * C列?

$ cat rowproc.py
import sys

for l in sys.stdin:
    row = map(float, l.split())
# process row

$ paste *.csv | tail -n+25 | python rowproc.py

Or, if you're unlucky enough to not have a Unix-like environment handy and have to do everything in python: 或者,如果您很不幸没有方便的类Unix环境,而必须使用python做所有事情:

import sys
from  itertools import izip

filehandles = [ open(fn) for fn in sys.argv[1:] ]
for i, rows in enumerate(izip(*filehandles)):
    if i<25: continue

    cols = [ map(float, row.split()) for row in rows ]
    print cols

Result: 结果:

[[150.0, 26.0], [6.0, 8.0], [14.0, 10.0]]
[[160.0, 27.0], [7.0, 9.0], [16.0, 11.0]]
[[170.0, 28.0], [8.0, 10.0], [18.0, 12.0]
...

As long as you're able to open enough files simultaneously, both of these methods will handle arbitrarily large amounts of data. 只要您能够同时打开足够的文件,这两种方法都将处理任意数量的数据。

If you can't pass the filenames through argv then use Glob 如果您无法通过argv传递文件名,请使用Glob

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM