简体   繁体   中英

How to join multiple dataframes within a loop using python pandas

I have 3 tables on each excel sheet: sheet1 - Gross , sheet2 - Margin , sheet3 - Revenue

So I was able to iterate through each sheet and unpivot it.

But how can I join them together?

在此处输入图片说明

    sheet_names = ['Gross','Margin','Revenue']

    full_table = pd.DataFrame()
    for sheet in sheet_names:
        df = pd.read_excel(BudgetData.xlsx', sheet_name = sheet, index=False)
        unpvt = pd.melt(df,id_vars=['Company'], var_name ='Month', value_name = sheet)
# how can I join unpivoted dataframes here?
        print(unpvt)

在此处输入图片说明

Desirable result:

在此处输入图片说明

UPDATE:

Thanks @Celius Stingher. I think this is what I need. It just gives me weird sorting:

在此处输入图片说明

and gives me this warning:

Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  from ipykernel import kernelapp as app

So it seems you are doing the pivoting but not saving each unpivoted dataframe anywhere. Let's create a list of dataframes, that will store each unpivoted dataframe. Later, we will pass that list of dataframes as argument for the pd.concat function to perform the concatenation.

sheet_names = ['Gross','Margin','Revenue']
list_of_df = []
full_table = pd.DataFrame()
for sheet in sheet_names:
    df = pd.read_excel(BudgetData.xlsx', sheet_name = sheet, index=False)
    df = pd.melt(df,id_vars=['Company'], var_name ='Month', value_name = sheet)
    list_of_df.append(df)

full_df = pd.concat(list_of_df,ignore_index=True)
full_df = full_df.sort_values(['Company','Month'])
print(full_df)

Edit:

Now that I understand what you need, let's try a different approach. After the loop try the following code instread of the pd.concat :

full_df = list_of_df[0].merge(list_of_df[1],on=['Company','Month']).merge(list_of_df[2],on=['Company','Month'])

A pd.concat will just pile everything together, you want to actually merge the DataFrames using pd.merge. This works similarly to a SQL Join statement. (based on the 'desired' image in your post)

https://pandas.pydata.org/pandas-docs/version/0.19.1/generated/pandas.DataFrame.merge.html

you just want to use a list of columns to merge on. If you get them all into tidy data frames with the same names as your sheets you would want to do something like:

gross.merge(margin, on=['Company', 'Month']).merge(revenue, on=['Company', 'Month'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM