简体   繁体   中英

how to slice and combine specific row values in a Pandas groupby?

Consider the following dataframe

df = pd.DataFrame({'group1' : ['A', 'A', 'A', 'A',
                         'A', 'A', 'A', 'A'],
                   'group2' : ['C', 'C', 'C', 'C',
                         'C', 'E', 'E', 'E'],
                   'time' : [-6,-5,-4,-3,-2,-6,-3,-4] , 
                   'col': [1,2,3,4,5,6,7,8]})

df
Out[36]: 
   col group1 group2  time
0    1      A      C    -6
1    2      A      C    -5
2    3      A      C    -4
3    4      A      C    -3
4    5      A      C    -2
5    6      A      E    -6
6    7      A      E    -3
7    8      A      E    -4

my objective is to create a column that contains, for each group in ['group1','group2'] the ratio of col evaluated at time = -6 divided by col evaluated at time = -4 .

That is, for group ['A','C'] , I expect this column to be equal to 1/3, for group ['A','E'] it is 6/8. Both group1 and group1 take on many different values in the data.

How can I get that in Pandas?

Something like

 df.groupby(['group1','group2']).transform(lambda x: x.ix[x['time'] == -6,'col'] / x.ix[x['time'] == -4,'col'])

does not work.. Any ideas?

Thanks!

You could do it without groupby like this:

dfm = pd.merge(df[df.time == -4],df[df.time == -6],on=["group1","group2"])
dfm['Div'] = dfm.col_y.div(dfm.col_x)
df = pd.merge(df,dfm[['group1','group2','Div']],on=["group1","group2"])

Output:

   col group1 group2  time       Div
0    1      A      C    -6  0.333333
1    2      A      C    -5  0.333333
2    3      A      C    -4  0.333333
3    4      A      C    -3  0.333333
4    5      A      C    -2  0.333333
5    6      A      E    -6  0.750000
6    7      A      E    -3  0.750000
7    8      A      E    -4  0.750000

Your solution in a ridiculously long list iteration (most pythonic way btw). Also, your question makes sense but the ratio for group A,C you have listed as 1/4 is actually 1/3

summary = [(name,group[group.time == -6].col.values[0],group[group.time == -4].col.values[0]) for name,group in df.groupby(['group1','group2'])]

pd.DataFrame(summary, columns=['group', 'numerator', 'denominator'])

Another way using groupby with a custom function:

def time_selection(row):
    N_r = row.loc[row['time'] == -6, 'col'].squeeze()
    D_r = row.loc[row['time'] == -4, 'col'].squeeze()
    return (N_r/D_r)

pd.merge(df, df.groupby(['group1','group2']).apply(time_selection).reset_index(name='div'))

图片

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM