简体   繁体   English

Python脚本未合并CSV文件

[英]Python script not combining csv files

I am trying to combine over 100,000 CSV files (all same formats) in a folder using below script. 我正在尝试使用以下脚本在一个文件夹中合并超过100,000个CSV文件(所有相同格式)。 Each CSV file is on average 3-6KB of size. 每个CSV文件的平均大小为3-6KB。 When I run this script, it only opens exact 47 .csv files and combines. 当我运行此脚本时,它只会打开确切的47个.csv文件并进行组合。 When I re-run it only combines same .csv files, not all of them. 当我重新运行时,它仅合并相同的.csv文件,而不是全部。 I don't understand why it is doing that? 我不明白为什么要这么做?

import os
import glob

os.chdir("D:\Users\Bop\csv")    

want_header = True
out_filename = "combined.files.csv"          

if os.path.exists(out_filename):
    os.remove(out_filename)

read_files = glob.glob("*.csv")

with open(out_filename, "w") as outfile:
    for filename in read_files:
        with open(filename) as infile:
            if want_header:
                outfile.write('{},Filename\n'.format(next(infile).strip()))
                want_header = False
            else:
                next(infile)
            for line in infile:
                outfile.write('{},{}\n'.format(line.strip(), filename))

Firstly check the length of read_files: 首先检查read_files的长度:

read_files = glob.glob("*.csv")
print(len(read_files))

Note that glob isn't necessarily recursive as described in this SO question . 请注意,glob不一定像本SO问题中所述是递归的。

Otherwise your code looks fine. 否则,您的代码看起来不错。 You may want to consider using the CSV library but note that you need to adjust the field size limit with really large files . 您可能要考虑使用CSV库,但请注意,您需要调整非常大文件的字段大小限制。

Are you shure your all filenames ends with .csv ? 您确定所有文件名都以.csv结尾吗? If all files in this directory contains what you need, then open all of them without filtering. 如果此目录中的所有文件都包含您所需要的文件,则无需过滤即可打开所有文件。

glob.glob('*') 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM