简体   繁体   中英

Pandas Dataframe Stacking versus Pivoting

I'm using pandas to reshape some string/numeric-valued responses, and I've run into some behavior that's a bit counterintuitive.

Can someone explain the difference between the dataframes stacked and pivoted below, and why pivoted2 raises the DataError even though no aggfunc is passed?

import pandas as pd

d = {'ID': pd.Series(['x']*3 + ['y']*3,index = range(6)),
     'Count': pd.Series([1,2,1,1,1,1], index = range(6)),
     'Value_type': pd.Series(['foo','foo','bar','foo','bar','baz'], index = range(6)),
     'Value': pd.Series(range(1,7),index = range(6))}
df = pd.DataFrame(d)

d2 = {'ID': pd.Series(['x']*3 + ['y']*3,index = range(6)),
     'Count': pd.Series([1,2,1,1,1,1], index = range(6)),
     'Value_type': pd.Series(['foo','foo','bar','foo','bar','baz'], index = range(6)),
     'Value': pd.Series(list('abcdef'),index = range(6))}
df2 = pd.DataFrame(d2)

restacked = df.set_index(['ID','Count','Value_type']).unstack()
print restacked

restacked2 =  df2.set_index(['ID','Count','Value_type']).unstack()
print restacked2

pivoted = pd.pivot_table(df,rows = ['ID','Count'],cols = 'Value_type',values = 'Value')
print pivoted

## raises DataError('No numeric types to aggregate'), 
## even though no aggregation function is passed.
pivoted2 = pd.pivot_table(df2,rows = ['ID','Count'],cols = 'Value_type',values = 'Value')
print pivoted2

The default agg function is np.mean (even though you didn't pass it explicitly this is what is being used), which doesn't make sense on strings, in fact it raises an AttributeError when passed an object array - so pandas complains when you try to do this.

You could pass np.sum :

In [11]: pd.pivot_table(df2, rows=['ID', 'Count'], cols='Value_type',
                        values='Value', aggfunc=np.sum)
Out[11]: 
Value_type  bar  baz foo
ID Count                
x  1          c  NaN   a
   2        NaN  NaN   b
y  1          e    f   d

Or take the first item using iloc[0] :

In [12]: pd.pivot_table(df2, rows=['ID', 'Count'], cols='Value_type',
                        values='Value', aggfunc=lambda x: x.iloc[0])
Out[12]: 
Value_type  bar  baz foo
ID Count                
x  1          c  NaN   a
   2        NaN  NaN   b
y  1          e    f   d

Note: that this is the same as pivoted2['Value'] , you can make this output the same as pivoted2 if you pass a list to values to aggregate:

In [13]: pd.pivot_table(df2, rows=['ID', 'Count'], cols=['Value_type'], 
                        values=['Value'], aggfunc=lambda x: x.iloc[0])
Out[13]: 
           Value         
Value_type   bar  baz foo
ID Count                 
x  1           c  NaN   a
   2         NaN  NaN   b
y  1           e    f   d

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM