简体   繁体   中英

How to merge 300 files into 3 files?

I have a bunch of files in a folder. All consist of three different types; three different schemas. I want to group the files into three types.

  1. 'FFIEC CDR Call Bulk POR'
  2. 'FFIEC CDR Call Schedule CI'
  3. 'FFIEC CDR Call Schedule ENT'

I want to save these 300 files as 3 CSV files, based on the file names being similar.

Here are actual file names.

FFIEC CDR Call Bulk POR 03312001.txt
FFIEC CDR Call Bulk POR 03312002.txt
...
FFIEC CDR Call Schedule CI 03312001.txt
FFIEC CDR Call Schedule CI 03312002.txt
...
FFIEC CDR Call Schedule ENT 03312001.txt
FFIEC CDR Call Schedule ENT 03312002.txt

I think the problem is with this line:

if x in f:

Here is the code that I am testing.

import os, glob
import pandas as pd

mylist = ['FFIEC CDR Call Bulk POR',
        'FFIEC CDR Call Schedule CI',
        'FFIEC CDR Call Schedule ENT']

path = "C:\\Users\\ryans\\OneDrive\\Desktop\\schemas\\"

all_files = glob.glob(os.path.join(path, "*.txt"))

all_df = []
for f in all_files: 
    for x in mylist:
        if x in f:
            print(x)
            df = pd.read_csv(f, delimiter='\t', skiprows=1) 
            df['file'] = os.path.basename(f)
            all_df.append(df) 

df_append = pd.concat(all_df, ignore_index=True, sort=True)
df_append.to_csv("C:\\Users\\ryans\\OneDrive\\Desktop\\" + x + ".csv")

When I run this code, everything is dumped into one single CSV file. I want three separate CSV files, one for each of the similar/grouped file names. I made some progress, but I couldn't quite figure it out.

Frankly problem seems so primitive so I don't understand why you have problem

You put all files on the same list

all_df.append(df) 

so finally it has to write all files to one CSV.

You shoudl create three lists for three types of files. Or one dictionary with three list for files

all_df = {
    'FFIEC CDR Call Bulk POR': [],  # list for files `FFIEC CDR Call Bulk POR`
    'FFIEC CDR Call Schedule CI': [], # list for files `FFIEC CDR Call Schedule CI`
    'FFIEC CDR Call Schedule ENT': [], # list for files `FFIEC CDR Call Schedule ENT
}

And then you can use x to put file to correct list

all_df[ x ].append(df) 

And after loop you can use another loop to save three files

for x in mylist:
    # use `all_df[x]
    df_append = pd.concat(all_df[x], ignore_index=True, sort=True)
    df_append.to_csv("C:\\Users\\ryans\\OneDrive\\Desktop\\" + x + ".csv")

import os, glob
import pandas as pd

mylist = [
    'FFIEC CDR Call Bulk POR',
    'FFIEC CDR Call Schedule CI',
    'FFIEC CDR Call Schedule ENT'
]

path = "C:\\Users\\ryans\\OneDrive\\Desktop\\schemas\\"

all_files = glob.glob(os.path.join(path, "*.txt"))

all_df = {
    'FFIEC CDR Call Bulk POR': [],  # list for files `FFIEC CDR Call Bulk POR`
    'FFIEC CDR Call Schedule CI': [], # list for files `FFIEC CDR Call Schedule CI`
    'FFIEC CDR Call Schedule ENT': [], # list for files `FFIEC CDR Call Schedule ENT
}

# --- first loop ---

for f in all_files: 
    for x in mylist:
        if x in f:
            print(x)
            df = pd.read_csv(f, delimiter='\t', skiprows=1) 
            df['file'] = os.path.basename(f)
            all_df[x].append(df) 

# --- second loop ---

for x in mylist:
    # use `all_df[x]
    df_append = pd.concat(all_df[x], ignore_index=True, sort=True)
    df_append.to_csv("C:\\Users\\ryans\\OneDrive\\Desktop\\" + x + ".csv")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM