简体   繁体   中英

Iterate through multiple dataframes and perform calculation based on specific column

I have several data frames that have months as the columns, and contain integer values. I am posting 2 for this example.

df1 =     
             June 2016       July 2016
Flavor
Vanilla      17.0            23.0
Chocolate    7.0             12.0
Strawberry   11.0            14.0

df2 =        
             June 2016       July 2016
Flavor
Vanilla      9.0            19.0
Chocolate    10.0           3.0

How can I iterate through each dataframe and perform a calculation dependent on the row and column name of the dataframe when they have to match? For example, I want to calculate the average for Vanilla for July, which would be (23 + 19)/2. If a Flavor also does not exist in a data frame, then I would also like to assign a constant value (say 15 in this example) per month in that data frame. Would I append the data frames together then apply .mean() ?

Thanks in advance, and sorry for any abruptness, I am currently on the go, traveling.

Thanks!

Using groupby with columns

pd.concat([df1,df2],1).fillna(15).groupby(level=0,axis=1).mean()
Out[408]: 
            July2016  June2016
Chocolate        7.5       8.5
Strawberry      14.5      13.0
Vanilla         21.0      13.0

Consider directly vectorize as you can run arithmetic operations across similar structured dataframes:

(df1 + df2.reindex(labels=df1.index.values, fill_value=15)) / 2

#             June 2016  July 2016
# Flavor                          
# Vanilla          13.0       21.0
# Chocolate         8.5        7.5
# Strawberry       13.0       14.5

And for many dataframes in a list, consider reduce :

from functools import reduce

df_list = [df1, df2]
new_df_list = [d.reindex(labels=df1.index.values, fill_value=15) for d in df_list]

reduce(lambda x,y: x + y, new_df_list) / len(new_df_list)

#             June 2016  July 2016
# Flavor                          
# Vanilla          13.0       21.0
# Chocolate         8.5        7.5
# Strawberry       13.0       14.5

Data

import pandas as pd
from io import StringIO

txt = '''             
Flavor      "June 2016"       "July 2016"
Vanilla      17.0            23.0
Chocolate    7.0             12.0
Strawberry   11.0            14.0'''

df1 = pd.read_table(StringIO(txt), sep="\s+", index_col=0)


txt = '''             
Flavor      "June 2016"       "July 2016"
Vanilla      9.0            19.0
Chocolate    10.0           3.0'''

df2 = pd.read_table(StringIO(txt), sep="\s+", index_col=0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM