Creating unique dataframe from all csv files stored in different folders

Question

I have several folder which stores several csv files. I would like to create one unique file/dataframe using a function in Python.

A folder, called Main_Folder , has 3 subfolders: Folder from A , Folder from B , Folder from C . Folder A contains three csv files:

filename1+key;
filename2+board;
filename3+cat;

Similarly the other two folders, B

filename1+tast;
filename2+board_1;
filename3+dog;

and C

filename+test;
filename+b;
filename+d;

What I have tried is

def create_dataframe(nam):
    path = "path/Folder from "+nam+"/"
    files = [f.split('.')[0] for f in listdir(path) if isfile(join(path, f))]

    dataframe={}
    for file in files:
         dataframe[file] = pd.read_csv(path+file+'.csv')

but it seems not working (no output when I call the function). I think my approach is wrong. My desired output would be a dataframe (unique) having all files from the different three folders (A,B, and C), with two extra columns, one for A/B and C (ie which can tell me the dataset where it comes from) and another one for the filename.

Something like this:

Col1 Col2 Col3 Col4 .... Source  FileName
.. .. .. .. .. .. ..       A    filename1+tast
.. .. .. .. .. .. ..      ..    ..
.. .. .. .. .. .. ..       A    filename3+cat
.. .. .. .. .. .. ..       B    filename1+tast
.. .. .. .. .. .. ..      ..    ..
.. .. .. .. .. .. ..       B    filename3+dog
.. .. .. .. .. .. ..       C    filename+test
.. .. .. .. .. .. ..      ..    ..
.. .. .. .. .. .. ..       C    filename+d

Please let me know if you need more details or if you have any questions on this.

Answer 1

Your function is not working, because it does not return anything.

To combine different dataframes you can use pd.concat method.

For example:

def create_dataframe(paths):
    """ Creates combined dataframe from csv files in paths """

    def get_files_in_path(path):
        return [f.split('.')[0] for f in listdir(path) if isfile(join(path, f))]

    dataframes = {
        (path, file): pd.read_csv(path + file + '.csv')
        for path in paths
        for file in get_files_in_path(path)
    }

    df = pd.concat(dataframes, names=['path', 'file', '_'])
    return df

paths = [f"path/Folder from {name}/" for name in ['A', 'B', 'C']]
df = create_dataframe(paths)

You can also call df.reset_index(inplace=True) to convert index into columns:

Creating unique dataframe from all csv files stored in different folders

Question

1 answers

solution1
0 ACCPTED 2020-06-28 17:55:37

Creating unique dataframe from all csv files stored in different folders

Question

1 answers

solution1 0 ACCPTED 2020-06-28 17:55:37

solution1
0 ACCPTED 2020-06-28 17:55:37