Consider the following dataframe
df = pd.DataFrame({'group1' : ['A', 'A', 'A', 'A',
'A', 'A', 'A', 'A'],
'group2' : ['C', 'C', 'C', 'C',
'C', 'E', 'E', 'E'],
'time' : [-6,-5,-4,-3,-2,-6,-3,-4] ,
'col': [1,2,3,4,5,6,7,8]})
df
Out[36]:
col group1 group2 time
0 1 A C -6
1 2 A C -5
2 3 A C -4
3 4 A C -3
4 5 A C -2
5 6 A E -6
6 7 A E -3
7 8 A E -4
my objective is to create a column that contains, for each group in ['group1','group2']
the ratio of col
evaluated at time = -6
divided by col
evaluated at time = -4
.
That is, for group ['A','C']
, I expect this column to be equal to 1/3, for group ['A','E']
it is 6/8. Both group1
and group1
take on many different values in the data.
How can I get that in Pandas?
Something like
df.groupby(['group1','group2']).transform(lambda x: x.ix[x['time'] == -6,'col'] / x.ix[x['time'] == -4,'col'])
does not work.. Any ideas?
Thanks!
You could do it without groupby
like this:
dfm = pd.merge(df[df.time == -4],df[df.time == -6],on=["group1","group2"])
dfm['Div'] = dfm.col_y.div(dfm.col_x)
df = pd.merge(df,dfm[['group1','group2','Div']],on=["group1","group2"])
Output:
col group1 group2 time Div
0 1 A C -6 0.333333
1 2 A C -5 0.333333
2 3 A C -4 0.333333
3 4 A C -3 0.333333
4 5 A C -2 0.333333
5 6 A E -6 0.750000
6 7 A E -3 0.750000
7 8 A E -4 0.750000
Your solution in a ridiculously long list iteration (most pythonic way btw). Also, your question makes sense but the ratio for group A,C you have listed as 1/4 is actually 1/3
summary = [(name,group[group.time == -6].col.values[0],group[group.time == -4].col.values[0]) for name,group in df.groupby(['group1','group2'])]
pd.DataFrame(summary, columns=['group', 'numerator', 'denominator'])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.