Pandas Dataframe Stacking versus Pivoting

Question

I'm using pandas to reshape some string/numeric-valued responses, and I've run into some behavior that's a bit counterintuitive.

Can someone explain the difference between the dataframes stacked and pivoted below, and why pivoted2 raises the DataError even though no aggfunc is passed?

import pandas as pd

d = {'ID': pd.Series(['x']*3 + ['y']*3,index = range(6)),
     'Count': pd.Series([1,2,1,1,1,1], index = range(6)),
     'Value_type': pd.Series(['foo','foo','bar','foo','bar','baz'], index = range(6)),
     'Value': pd.Series(range(1,7),index = range(6))}
df = pd.DataFrame(d)

d2 = {'ID': pd.Series(['x']*3 + ['y']*3,index = range(6)),
     'Count': pd.Series([1,2,1,1,1,1], index = range(6)),
     'Value_type': pd.Series(['foo','foo','bar','foo','bar','baz'], index = range(6)),
     'Value': pd.Series(list('abcdef'),index = range(6))}
df2 = pd.DataFrame(d2)

restacked = df.set_index(['ID','Count','Value_type']).unstack()
print restacked

restacked2 =  df2.set_index(['ID','Count','Value_type']).unstack()
print restacked2

pivoted = pd.pivot_table(df,rows = ['ID','Count'],cols = 'Value_type',values = 'Value')
print pivoted

## raises DataError('No numeric types to aggregate'), 
## even though no aggregation function is passed.
pivoted2 = pd.pivot_table(df2,rows = ['ID','Count'],cols = 'Value_type',values = 'Value')
print pivoted2

Answer 1

The default agg function is np.mean (even though you didn't pass it explicitly this is what is being used), which doesn't make sense on strings, in fact it raises an AttributeError when passed an object array - so pandas complains when you try to do this.

You could pass np.sum :

In [11]: pd.pivot_table(df2, rows=['ID', 'Count'], cols='Value_type',
                        values='Value', aggfunc=np.sum)
Out[11]: 
Value_type  bar  baz foo
ID Count                
x  1          c  NaN   a
   2        NaN  NaN   b
y  1          e    f   d

Or take the first item using iloc[0] :

In [12]: pd.pivot_table(df2, rows=['ID', 'Count'], cols='Value_type',
                        values='Value', aggfunc=lambda x: x.iloc[0])
Out[12]: 
Value_type  bar  baz foo
ID Count                
x  1          c  NaN   a
   2        NaN  NaN   b
y  1          e    f   d

Note: that this is the same as pivoted2['Value'] , you can make this output the same as pivoted2 if you pass a list to values to aggregate:

In [13]: pd.pivot_table(df2, rows=['ID', 'Count'], cols=['Value_type'], 
                        values=['Value'], aggfunc=lambda x: x.iloc[0])
Out[13]: 
           Value         
Value_type   bar  baz foo
ID Count                 
x  1           c  NaN   a
   2         NaN  NaN   b
y  1           e    f   d

Pandas Dataframe Stacking versus Pivoting

Question

1 answers

solution1
3 ACCPTED 2014-01-30 22:14:14

Pandas Dataframe Stacking versus Pivoting

Question

1 answers

solution1 3 ACCPTED 2014-01-30 22:14:14

solution1
3 ACCPTED 2014-01-30 22:14:14