Assign groupby-apply result to parent dataframe

Question

I have the following data frame:

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'],
                   'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
                   'C' : np.random.randn(8),
                   'D' : np.random.randn(8)})

    A   B   C   D
0   foo one 0.478183    -1.267588
1   bar one 0.555985    -2.143590
2   foo two -1.592865   1.251546
3   bar three   0.174138    -0.708198
4   foo two 0.302215    -0.219041
5   bar two -0.034550   -0.965414
6   foo one 1.310828    -0.388601
7   foo three   0.357659    -1.610443

I'm trying to add another column which will be a normalized version of column C over partition by A:

normed = df.groupby('A').apply(lambda x: (x['C']-min(x['C']))/(max(x['C'])-min(x['C'])))

A     
bar  1    0.000000
     3    0.033396
     5    1.000000
foo  0    1.000000
     2    0.413716
     4    0.000000
     6    0.441061
     7    0.357787

Finally I want to join this result back to df (using advice from the similar question ):

df.join(normed, on='A', rsuffix='_normed')

However, I get an error:

ValueError: len(left_on) must equal the number of levels in the index of "right"

How can I add normed result back to dataframe df ?

Answer 1

You get this error because you have a MultiIndex with length 2 in the first level. The second level is the original index.

normed.index

Out[35]:

MultiIndex(levels=[['bar', 'foo'], [0, 1, 2, 3, 4, 5, 6, 7]],
           labels=[[0, 0, 0, 1, 1, 1, 1, 1], [1, 3, 5, 0, 2, 4, 6, 7]],
           names=['A', None])

You probably want to join on the Original index, so you must drop the first level of the new index

normed.index = normed.index.droplevel()

before joining:

df.join(normed, rsuffix='_normed')

Answer 2

The simplest way is to apply reset_index to the normed

normed = df.groupby('A').apply(lambda x: (x['C']-min(x['C']))/(max(x['C'])-min(x['C'])))
normed = normed.reset_index(level=0, drop=True)

And now simply add normed as a column to df

df['normed'] = normed

Answer 3

Actually, there is a very easy solution. When groupby is doing a one-for-one operation (rather than a reduction), you can use transform and the indexing is already taken care of for you:

df['c_normed'] = df.groupby('A')['C'].transform( lambda x: (x-min(x))/(max(x)-min(x)))

Also note that the code is a bit cleaner if you use df.groupby('A')['C'] because then you can just use x instead of x['C'] inside the lambda. And also in this case using x['C'] works with apply but not transform (I am not sure why...).

Answer 4

What you can do is the following :

# Get tuples (index, value) for each level
foo = zip(normed.foo.index, normed.foo.values)
bar = zip(normed.bar.index, normed.bar.values)

# Merge the two lists
foo.extend(bar) # merged lists contained in foo

# Sort the list
new_list = sorted(foo, key=lambda x: x[0])

# Create new column in dataframe
index, values = zip(*new_list) # unzip
df['New_column'] = values

Output

Out[85]: 
 A      B         C         D  New_column
0  foo    one  0.039683 -0.041559    0.638594
1  bar    one -0.090650 -2.316097    0.000000
2  foo    two  0.024210  0.616764    0.629815
3  bar  three  0.142740  0.156198    0.450339
4  foo    two -1.085916 -0.432832    0.000000
5  bar    two  0.427604 -1.154850    1.000000
6  foo    one -0.156424  0.037188    0.527335
7  foo  three  0.676706 -1.336921    1.000000

NB : Maybe there is a cleverer way to do this.

Answer 5

You have to get rid of the the first-level of the multi-index created by groupby first (ie 'Foo' and 'Bar').

Adding the following code should work:

normed = normed.reset_index(level=0)
del normed['A']
normed.rename(columns={'C':'C_normed'}, inplace=True)
pd.concat([df, normed], axis=1)

Result:

A   B   C   D   C_normed
0   foo one 1.697923    0.656727    1.000000
1   bar one -0.626052   -0.466088   0.000000
2   foo two -0.501440   1.080408    0.000000
3   bar three   0.731791    -1.531915   1.000000
4   foo two -0.202666   0.275042    0.135846
5   bar two -0.340455   -0.737039   0.210332
6   foo one 0.506664    1.049853    0.458362
7   foo three   -0.358317   -0.598262   0.065075

Assign groupby-apply result to parent dataframe

Question

5 answers

solution1
3 ACCPTED 2016-11-14 16:10:44

solution2
2 2016-11-14 16:14:36

solution3
2 2016-11-14 16:46:38

solution4
1 2016-11-14 16:09:15

solution5
1 2016-11-14 16:12:20

Assign groupby-apply result to parent dataframe

Question

5 answers

solution1 3 ACCPTED 2016-11-14 16:10:44

solution2 2 2016-11-14 16:14:36

solution3 2 2016-11-14 16:46:38

solution4 1 2016-11-14 16:09:15

solution5 1 2016-11-14 16:12:20

solution1
3 ACCPTED 2016-11-14 16:10:44

solution2
2 2016-11-14 16:14:36

solution3
2 2016-11-14 16:46:38

solution4
1 2016-11-14 16:09:15

solution5
1 2016-11-14 16:12:20