I have a bunch of files in a folder. All consist of three different types; three different schemas. I want to group the files into three types.
I want to save these 300 files as 3 CSV files, based on the file names being similar.
Here are actual file names.
FFIEC CDR Call Bulk POR 03312001.txt
FFIEC CDR Call Bulk POR 03312002.txt
...
FFIEC CDR Call Schedule CI 03312001.txt
FFIEC CDR Call Schedule CI 03312002.txt
...
FFIEC CDR Call Schedule ENT 03312001.txt
FFIEC CDR Call Schedule ENT 03312002.txt
I think the problem is with this line:
if x in f:
Here is the code that I am testing.
import os, glob
import pandas as pd
mylist = ['FFIEC CDR Call Bulk POR',
'FFIEC CDR Call Schedule CI',
'FFIEC CDR Call Schedule ENT']
path = "C:\\Users\\ryans\\OneDrive\\Desktop\\schemas\\"
all_files = glob.glob(os.path.join(path, "*.txt"))
all_df = []
for f in all_files:
for x in mylist:
if x in f:
print(x)
df = pd.read_csv(f, delimiter='\t', skiprows=1)
df['file'] = os.path.basename(f)
all_df.append(df)
df_append = pd.concat(all_df, ignore_index=True, sort=True)
df_append.to_csv("C:\\Users\\ryans\\OneDrive\\Desktop\\" + x + ".csv")
When I run this code, everything is dumped into one single CSV file. I want three separate CSV files, one for each of the similar/grouped file names. I made some progress, but I couldn't quite figure it out.
Frankly problem seems so primitive so I don't understand why you have problem
You put all files on the same list
all_df.append(df)
so finally it has to write all files to one CSV.
You shoudl create three lists for three types of files. Or one dictionary with three list for files
all_df = {
'FFIEC CDR Call Bulk POR': [], # list for files `FFIEC CDR Call Bulk POR`
'FFIEC CDR Call Schedule CI': [], # list for files `FFIEC CDR Call Schedule CI`
'FFIEC CDR Call Schedule ENT': [], # list for files `FFIEC CDR Call Schedule ENT
}
And then you can use x
to put file to correct list
all_df[ x ].append(df)
And after loop
you can use another loop to save three files
for x in mylist:
# use `all_df[x]
df_append = pd.concat(all_df[x], ignore_index=True, sort=True)
df_append.to_csv("C:\\Users\\ryans\\OneDrive\\Desktop\\" + x + ".csv")
import os, glob
import pandas as pd
mylist = [
'FFIEC CDR Call Bulk POR',
'FFIEC CDR Call Schedule CI',
'FFIEC CDR Call Schedule ENT'
]
path = "C:\\Users\\ryans\\OneDrive\\Desktop\\schemas\\"
all_files = glob.glob(os.path.join(path, "*.txt"))
all_df = {
'FFIEC CDR Call Bulk POR': [], # list for files `FFIEC CDR Call Bulk POR`
'FFIEC CDR Call Schedule CI': [], # list for files `FFIEC CDR Call Schedule CI`
'FFIEC CDR Call Schedule ENT': [], # list for files `FFIEC CDR Call Schedule ENT
}
# --- first loop ---
for f in all_files:
for x in mylist:
if x in f:
print(x)
df = pd.read_csv(f, delimiter='\t', skiprows=1)
df['file'] = os.path.basename(f)
all_df[x].append(df)
# --- second loop ---
for x in mylist:
# use `all_df[x]
df_append = pd.concat(all_df[x], ignore_index=True, sort=True)
df_append.to_csv("C:\\Users\\ryans\\OneDrive\\Desktop\\" + x + ".csv")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.