简体   繁体   中英

Creating unique dataframe from all csv files stored in different folders

I have several folder which stores several csv files. I would like to create one unique file/dataframe using a function in Python.

A folder, called Main_Folder , has 3 subfolders: Folder from A , Folder from B , Folder from C . Folder A contains three csv files:

  • filename1+key;
  • filename2+board;
  • filename3+cat;

Similarly the other two folders, B

  • filename1+tast;
  • filename2+board_1;
  • filename3+dog;

and C

  • filename+test;
  • filename+b;
  • filename+d;

What I have tried is

def create_dataframe(nam):
    path = "path/Folder from "+nam+"/"
    files = [f.split('.')[0] for f in listdir(path) if isfile(join(path, f))]

    dataframe={}
    for file in files:
         dataframe[file] = pd.read_csv(path+file+'.csv')

but it seems not working (no output when I call the function). I think my approach is wrong. My desired output would be a dataframe (unique) having all files from the different three folders (A,B, and C), with two extra columns, one for A/B and C (ie which can tell me the dataset where it comes from) and another one for the filename.

Something like this:

Col1 Col2 Col3 Col4 .... Source  FileName
.. .. .. .. .. .. ..       A    filename1+tast
.. .. .. .. .. .. ..      ..    ..
.. .. .. .. .. .. ..       A    filename3+cat
.. .. .. .. .. .. ..       B    filename1+tast
.. .. .. .. .. .. ..      ..    ..
.. .. .. .. .. .. ..       B    filename3+dog
.. .. .. .. .. .. ..       C    filename+test
.. .. .. .. .. .. ..      ..    ..
.. .. .. .. .. .. ..       C    filename+d

Please let me know if you need more details or if you have any questions on this.

Your function is not working, because it does not return anything.

To combine different dataframes you can use pd.concat method.

For example:

def create_dataframe(paths):
    """ Creates combined dataframe from csv files in paths """

    def get_files_in_path(path):
        return [f.split('.')[0] for f in listdir(path) if isfile(join(path, f))]

    dataframes = {
        (path, file): pd.read_csv(path + file + '.csv')
        for path in paths
        for file in get_files_in_path(path)
    }

    df = pd.concat(dataframes, names=['path', 'file', '_'])
    return df

paths = [f"path/Folder from {name}/" for name in ['A', 'B', 'C']]
df = create_dataframe(paths)

You can also call df.reset_index(inplace=True) to convert index into columns:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM