简体   繁体   中英

Combining multiple csv files into one csv file

I am trying to combine multiple csv files into one, and have tried a number of methods but I am struggling.

I import the data from multiple csv files, and when I compile them together into one csv file, it seems that the first few rows get filled out nicely, but then it starts randomly inputting spaces of variable number in between the rows, and it never finishes filling out the combined csv file, it just seems to continuously get information added to it, which does not make sense to me because I am trying to compile a finite amount of data.

I have already tried writing close statements for the file, and I still get the same result, my designated combined csv file never stops getting data, and it will randomly space the data throughout the file - I just want a normally compiled csv.

Is there an error in my code? Is there any explanation as to why my csv file is behaving this way?

csv_file_list = glob.glob(Dir + '/*.csv') #returns the file list
print (csv_file_list)
with open(Avg_Dir + '.csv','w') as f:
    wf = csv.writer(f, delimiter = ',')
    print (f)
    for files in csv_file_list:
        rd = csv.reader(open(files,'r'),delimiter = ',')
        for row in rd:
            print (row)
            wf.writerow(row)

Do your files have the same structure? They need to have the following information to join, if not inconsistencies. If it is not variable, you can map the data to generate a final file.

On your code, it is correct, but to merge files, you could simply use the "cat" command, in case you are using a Unix-like operating system (Linux, MacOS, etc.), but if you have several files with different structures, so yes python will be perfect. But for this case, your code will needs some modifications.

Your code works for me.

Alternatively, you can merge files as follows:

csv_file_list = glob.glob(Dir + '/*.csv')
with open(Avg_Dir + '.csv','w') as wf:
    for file in csv_file_list:
        with open(file) as rf:
            for line in rf:
                if line.strip(): # if line is not empty
                    if not line.endswith("\n"):
                        line+="\n"
                    wf.write(line)

Or, if the files are not too large, you can read each file at once. But in this case all empty lines an headers will be copied:

csv_file_list = glob.glob(Dir + '/*.csv')
with open(Avg_Dir + '.csv','w') as wf:
    for file in csv_file_list:
        with open(file) as rf:
            wf.write(rf.read().strip()+"\n")

Consider several adjustments:

  1. Use context manager, with , for both the read and write process. This avoids the need to close() file objects which you do not do on the read objects.
  2. For skipping lines issue: use either the argument newline='' in open() or lineterminator="\\n" argument in csv.writer() . See SO answers for former and latter .
  3. Use os.path.join() to properly concatenate folder and file paths. This method is os-agnostic so accounts for Windows or Unix machines using forward or backslashes types.

Adjusted script:

import os
import csv, glob

Dir = r"C:\Path\To\Source"
Avg_Dir = r"C:\Path\To\Destination\Output"

csv_file_list = glob.glob(os.path.join(Dir, '*.csv')) # returns the file list
print (csv_file_list)

with open(Avg_Dir + '.csv', 'w', newline='') as f:
    wf = csv.writer(f, lineterminator='\n')

    for files in csv_file_list:
        with open(files, 'r') as r: 
            next(r)                   # SKIP HEADERS
            rr = csv.reader(r)
            for row in rr:
                wf.writerow(row)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM