[英]Python script not combining csv files
I am trying to combine over 100,000 CSV files (all same formats) in a folder using below script. 我正在尝试使用以下脚本在一个文件夹中合并超过100,000个CSV文件(所有相同格式)。 Each CSV file is on average 3-6KB of size.
每个CSV文件的平均大小为3-6KB。 When I run this script, it only opens exact 47 .csv files and combines.
当我运行此脚本时,它只会打开确切的47个.csv文件并进行组合。 When I re-run it only combines same .csv files, not all of them.
当我重新运行时,它仅合并相同的.csv文件,而不是全部。 I don't understand why it is doing that?
我不明白为什么要这么做?
import os
import glob
os.chdir("D:\Users\Bop\csv")
want_header = True
out_filename = "combined.files.csv"
if os.path.exists(out_filename):
os.remove(out_filename)
read_files = glob.glob("*.csv")
with open(out_filename, "w") as outfile:
for filename in read_files:
with open(filename) as infile:
if want_header:
outfile.write('{},Filename\n'.format(next(infile).strip()))
want_header = False
else:
next(infile)
for line in infile:
outfile.write('{},{}\n'.format(line.strip(), filename))
Firstly check the length of read_files: 首先检查read_files的长度:
read_files = glob.glob("*.csv")
print(len(read_files))
Note that glob isn't necessarily recursive as described in this SO question . 请注意,glob不一定像本SO问题中所述是递归的。
Otherwise your code looks fine. 否则,您的代码看起来不错。 You may want to consider using the CSV library but note that you need to adjust the field size limit with really large files .
您可能要考虑使用CSV库,但请注意,您需要调整非常大文件的字段大小限制。
Are you shure your all filenames ends with .csv
? 您确定所有文件名都以
.csv
结尾吗? If all files in this directory contains what you need, then open all of them without filtering. 如果此目录中的所有文件都包含您所需要的文件,则无需过滤即可打开所有文件。
glob.glob('*')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.