[英]Adding columns to a csv file, based on data from multiple files, using Python?
[英]Python - sort csv data from multiple files based on columns
我有一个包含多个文件的文件夹,每个文件在每个文件中具有不同数量的列。 我想浏览目录,打开每个文件并循环浏览每一行,然后根据该行中的列数将该行写入新的CSV文件。 我想最后针对包含14列的所有行使用单个大CSV,针对包含18列的所有行使用另一个大CSV,以及包含所有其他列的最后一个CSV。
到目前为止,这就是我所拥有的。
import pandas as pd
import glob
import os
import csv
path = r'C:\Users\Vladimir\Documents\projects\ETLassig\W3SVC2'
all_files = glob.glob(os.path.join(path, "*.log"))
for file in all_files:
for line in file:
if len(line.split()) == 14:
with open('c14.csv', 'wb') as csvfile:
csvwriter = csv.writer(csvfile, delimiter=' ')
csvwriter.writerow([line])
elif len(line.split()) == 18:
with open('c14.csv', 'wb') as csvfile:
csvwriter = csv.writer(csvfile, delimiter=' ')
csvwriter.writerow([line])
#open 18.csv
else:
with open('misc.csv', 'wb') as csvfile:
csvwriter = csv.writer(csvfile, delimiter=' ')
csvwriter.writerow([line])
print(c14.csv)
谁能提供有关如何处理此问题的任何反馈?
您可以将所有列添加为列表中的列表:
l = []
for file in [your_files]:
with open(file, 'r') as f:
for line in f.readlines()
l.appned(line.split(" "))
现在您有了列表列表,因此只需按子列表的长度对其进行排序,然后将其放入新文件中:
l.sort(key=len)
with open(outputfile, 'w'):
# Write lines here as you want
在此之前,请注意,您可以复制行作为从输入文件到输出的,不需要在CSV机械。
就是说,我建议使用文件对象的字典和字典的get
方法,该方法允许指定默认值。
files = {14:open('14.csv', 'wb'),
18:open('18.csv', 'wb')}
other = open('other.csv', 'wb')
for file in all_files:
for line in open(file):
llen = len(line.split())
target = files.get(llen, other)
target.write(line)
如果您必须处理几百万条记录,请注意,因为
In [20]: a = 'a '*20
In [21]: %timeit len(a.split())
599 ns ± 1.59 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [22]: %timeit a.count(' ')+1
328 ns ± 1.28 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
您应该将上述for
循环替换for
for file in all_files:
for line in open(file):
fields_count = line.count(' ')+1
target = files.get(fields_count, other)
target.write(line)
应该是因为,即使我们说的是纳秒,文件系统的访问也处于相同的状态
In [23]: f = open('dele000', 'w')
In [24]: %timeit f.write(a)
508 ns ± 154 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
作为拆分/计数。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.