I am reading the 'Python for Data Analysis' book and I was working through an example as prototyped below.
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'a' : [1,2, 3], 'b' : [3,4,6]}, index=['AA', 'BB', 'CC'])
In [313]: df1
Out[313]:
a b
AA 1 3
BB 2 4
CC 3 6
In [314]: df1.groupby(['one', 'two', 'one']).mean()
Out[314]:
a b
one 2 4.5
two 2 4.0
Now, when I use transform(np.mean)
on the DataFrame
, I am getting:
In [315]: df1.groupby(['one', 'two', 'one']).transform(np.mean)
Out[315]:
a b
AA NaN NaN
BB NaN NaN
CC NaN NaN
one 2 4.5
two 2 4.0
Based on the book and documentation, I should get
a b
AA 2 4.5
BB 2 4.0
CC 2 4.5
Can somebody explain am I doing something wrong, or has there been a change in behavior of pandas transform
For reference for people who have the book, a similar example is on Page 265, Python for Data Analysis" ( http://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793/ref=sr_1_1?ie=UTF8&qid=1414333292&sr=8-1&keywords=python+for+data+analysis )
EDIT:
This is the actual example in the book.
people = pd.DataFrame(np.random.randn(5,5), columns=list('abcde'), index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis'])
people.ix[2:3, ['b', 'c']] = np.nan
key = ['one', 'two', 'one', 'two', 'one']
people.groupby(key).transform(np.mean)
This should display averages by key in a dataframe with index = ['Joe', 'Steve', 'Wes', 'Jim', 'Travis']
and columns = list("abcde")
instead I get.
a b c d e
Jim NaN NaN NaN NaN NaN
Joe NaN NaN NaN NaN NaN
Steve NaN NaN NaN NaN NaN
Travis NaN NaN NaN NaN NaN
Wes NaN NaN NaN NaN NaN
one 0.115921 0.269327 -0.812230 0.901449 0.100471
two -1.371846 -0.918605 -0.391085 -0.425853 0.436742
I am actually using pandas version 0.14.1.
Updating my pandas version fixed the issue. It might have been a bug in the previous version. But not sure.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.