简体   繁体   中英

unexpected behavior pandas groupby transform

I am reading the 'Python for Data Analysis' book and I was working through an example as prototyped below.

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'a' : [1,2, 3], 'b' : [3,4,6]}, index=['AA', 'BB', 'CC'])

In [313]: df1
Out[313]: 
    a  b
AA  1  3
BB  2  4
CC  3  6

In [314]: df1.groupby(['one', 'two', 'one']).mean()
Out[314]: 
     a    b
one  2  4.5
two  2  4.0

Now, when I use transform(np.mean) on the DataFrame , I am getting:

In [315]: df1.groupby(['one', 'two', 'one']).transform(np.mean)
Out[315]: 
      a    b
AA  NaN  NaN
BB  NaN  NaN
CC  NaN  NaN
one   2  4.5
two   2  4.0

Based on the book and documentation, I should get

      a    b
AA    2  4.5
BB    2  4.0
CC    2  4.5

Can somebody explain am I doing something wrong, or has there been a change in behavior of pandas transform

For reference for people who have the book, a similar example is on Page 265, Python for Data Analysis" ( http://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793/ref=sr_1_1?ie=UTF8&qid=1414333292&sr=8-1&keywords=python+for+data+analysis )

EDIT:

This is the actual example in the book.

people = pd.DataFrame(np.random.randn(5,5), columns=list('abcde'), index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis'])

people.ix[2:3, ['b', 'c']] = np.nan

key = ['one', 'two', 'one', 'two', 'one']

people.groupby(key).transform(np.mean)

This should display averages by key in a dataframe with index = ['Joe', 'Steve', 'Wes', 'Jim', 'Travis'] and columns = list("abcde")

instead I get.

               a         b         c         d         e
Jim          NaN       NaN       NaN       NaN       NaN
Joe          NaN       NaN       NaN       NaN       NaN
Steve        NaN       NaN       NaN       NaN       NaN
Travis       NaN       NaN       NaN       NaN       NaN
Wes          NaN       NaN       NaN       NaN       NaN
one     0.115921  0.269327 -0.812230  0.901449  0.100471
two    -1.371846 -0.918605 -0.391085 -0.425853  0.436742

I am actually using pandas version 0.14.1.

Updating my pandas version fixed the issue. It might have been a bug in the previous version. But not sure.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM