简体   繁体   English

如何将 300 个文件合并为 3 个文件?

[英]How to merge 300 files into 3 files?

I have a bunch of files in a folder.我在一个文件夹中有一堆文件。 All consist of three different types;全部由三种不同的类型组成; three different schemas.三种不同的模式。 I want to group the files into three types.我想将文件分为三种类型。

  1. 'FFIEC CDR Call Bulk POR' 'FFIEC CDR 呼叫批量 POR'
  2. 'FFIEC CDR Call Schedule CI' 'FFIEC CDR 呼叫时间表 CI'
  3. 'FFIEC CDR Call Schedule ENT' 'FFIEC CDR 呼叫时间表 ENT'

I want to save these 300 files as 3 CSV files, based on the file names being similar.基于文件名相似,我想将这 300 个文件保存为 3 个 CSV 文件。

Here are actual file names.这是实际的文件名。

FFIEC CDR Call Bulk POR 03312001.txt
FFIEC CDR Call Bulk POR 03312002.txt
...
FFIEC CDR Call Schedule CI 03312001.txt
FFIEC CDR Call Schedule CI 03312002.txt
...
FFIEC CDR Call Schedule ENT 03312001.txt
FFIEC CDR Call Schedule ENT 03312002.txt

I think the problem is with this line:我认为问题出在这一行:

if x in f:

Here is the code that I am testing.这是我正在测试的代码。

import os, glob
import pandas as pd

mylist = ['FFIEC CDR Call Bulk POR',
        'FFIEC CDR Call Schedule CI',
        'FFIEC CDR Call Schedule ENT']

path = "C:\\Users\\ryans\\OneDrive\\Desktop\\schemas\\"

all_files = glob.glob(os.path.join(path, "*.txt"))

all_df = []
for f in all_files: 
    for x in mylist:
        if x in f:
            print(x)
            df = pd.read_csv(f, delimiter='\t', skiprows=1) 
            df['file'] = os.path.basename(f)
            all_df.append(df) 

df_append = pd.concat(all_df, ignore_index=True, sort=True)
df_append.to_csv("C:\\Users\\ryans\\OneDrive\\Desktop\\" + x + ".csv")

When I run this code, everything is dumped into one single CSV file.当我运行此代码时,所有内容都转储到一个 CSV 文件中。 I want three separate CSV files, one for each of the similar/grouped file names.我想要三个单独的 CSV 文件,一个用于每个相似/分组的文件名。 I made some progress, but I couldn't quite figure it out.我取得了一些进展,但我无法完全弄清楚。

Frankly problem seems so primitive so I don't understand why you have problem坦率地说,问题似乎很原始,所以我不明白你为什么有问题

You put all files on the same list您将所有文件放在同一个列表中

all_df.append(df) 

so finally it has to write all files to one CSV.所以最后它必须将所有文件写入一个 CSV。

You shoudl create three lists for three types of files.您应该为三种类型的文件创建三个列表。 Or one dictionary with three list for files或一本包含三个文件列表的字典

all_df = {
    'FFIEC CDR Call Bulk POR': [],  # list for files `FFIEC CDR Call Bulk POR`
    'FFIEC CDR Call Schedule CI': [], # list for files `FFIEC CDR Call Schedule CI`
    'FFIEC CDR Call Schedule ENT': [], # list for files `FFIEC CDR Call Schedule ENT
}

And then you can use x to put file to correct list然后您可以使用x将文件放入正确列表

all_df[ x ].append(df) 

And after loop you can use another loop to save three filesloop之后,您可以使用另一个循环来保存三个文件

for x in mylist:
    # use `all_df[x]
    df_append = pd.concat(all_df[x], ignore_index=True, sort=True)
    df_append.to_csv("C:\\Users\\ryans\\OneDrive\\Desktop\\" + x + ".csv")

import os, glob
import pandas as pd

mylist = [
    'FFIEC CDR Call Bulk POR',
    'FFIEC CDR Call Schedule CI',
    'FFIEC CDR Call Schedule ENT'
]

path = "C:\\Users\\ryans\\OneDrive\\Desktop\\schemas\\"

all_files = glob.glob(os.path.join(path, "*.txt"))

all_df = {
    'FFIEC CDR Call Bulk POR': [],  # list for files `FFIEC CDR Call Bulk POR`
    'FFIEC CDR Call Schedule CI': [], # list for files `FFIEC CDR Call Schedule CI`
    'FFIEC CDR Call Schedule ENT': [], # list for files `FFIEC CDR Call Schedule ENT
}

# --- first loop ---

for f in all_files: 
    for x in mylist:
        if x in f:
            print(x)
            df = pd.read_csv(f, delimiter='\t', skiprows=1) 
            df['file'] = os.path.basename(f)
            all_df[x].append(df) 

# --- second loop ---

for x in mylist:
    # use `all_df[x]
    df_append = pd.concat(all_df[x], ignore_index=True, sort=True)
    df_append.to_csv("C:\\Users\\ryans\\OneDrive\\Desktop\\" + x + ".csv")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM