简体   繁体   中英

Merge excel files with multiple sheets into one dataframe

I'm new to pd python and I'm trying to combine a lot of excel files from a folder (each file contains two sheets) and then add only certain columns from those sheets to the new dataframe. Each file has the same quantity of columns and sheet names, but sometimes a different number of rows.

I'll show you what I did with an example with two files. Screens of the sheets:

First sheet

Second sheet

Sheets from the second file have the same structure, but with different data in it.

Code:

import pandas as pd
import os
folder = [file for file in os.listdir('./test_folder/')]

consolidated = pd.DataFrame()

for file in folder:
    first = pd.concat(pd.read_excel('./test_folder/'+file, sheet_name=['first']))
    second = pd.concat(pd.read_excel('./test_folder/'+file, sheet_name=['second']))
    first_new = first.drop(['Col_K', 'Col_L', 'Col_M'], axis=1) #dropping unnecessary columns
    second_new = second.drop(['Col_DD', 'Col_EE', 'Col_FF','Col_GG','Col_HH', 'Col_II', 'Col_JJ', 'Col_KK', 'Col_LL', 'Col_MM', 'Col_NN', 'Col_OO', 'Col_PP', 'Col_QQ', 'Col_RR', 'Col_SS', 'Col_TT'], axis=1) #dropping unnecessary columns
    frames = [consolidated, second_new, first_new]
    consolidated = pd.concat(frames, axis=0)

consolidated.to_excel('all.xlsx', index=True)

So here is a result

And here's my desired result

So basically, I do not know how to ignore these empty cells and align these two data frames with each other. Most likely there's some problem with DFs indexes(first_new, second_new), but I don't know how to resolve it

pd.concat() has an ignore_index parameter, which you will need if your rows have differing indices across the individual frames . If they have a common index (like in my example), you do not need to ignore_index and can keep the column names.

Try:

pd.concat(frames, axis=1, ignore_index=True)
In [5]: df1 = pd.DataFrame({"A":2, "B":3}, index=[0, 1])

In [6]: df1
Out[6]:
   A  B
0  2  3
1  2  3

In [7]: df2 = pd.DataFrame({"AAA":22, "BBB":33}, index=[0, 1])

In [10]: df = pd.concat([df1, df2], axis=1, ignore_index=True)

In [11]: df
Out[11]:
   0  1   2   3
0  2  3  22  33
1  2  3  22  33

In [12]: df = pd.concat([df1, df2], axis=1, ignore_index=False)

In [13]: df
Out[13]:
   A  B  AAA  BBB
0  2  3   22   33
1  2  3   22   33

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM