I'm using pandas to reshape some string/numeric-valued responses, and I've run into some behavior that's a bit counterintuitive.
Can someone explain the difference between the dataframes stacked
and pivoted
below, and why pivoted2
raises the DataError
even though no aggfunc
is passed?
import pandas as pd
d = {'ID': pd.Series(['x']*3 + ['y']*3,index = range(6)),
'Count': pd.Series([1,2,1,1,1,1], index = range(6)),
'Value_type': pd.Series(['foo','foo','bar','foo','bar','baz'], index = range(6)),
'Value': pd.Series(range(1,7),index = range(6))}
df = pd.DataFrame(d)
d2 = {'ID': pd.Series(['x']*3 + ['y']*3,index = range(6)),
'Count': pd.Series([1,2,1,1,1,1], index = range(6)),
'Value_type': pd.Series(['foo','foo','bar','foo','bar','baz'], index = range(6)),
'Value': pd.Series(list('abcdef'),index = range(6))}
df2 = pd.DataFrame(d2)
restacked = df.set_index(['ID','Count','Value_type']).unstack()
print restacked
restacked2 = df2.set_index(['ID','Count','Value_type']).unstack()
print restacked2
pivoted = pd.pivot_table(df,rows = ['ID','Count'],cols = 'Value_type',values = 'Value')
print pivoted
## raises DataError('No numeric types to aggregate'),
## even though no aggregation function is passed.
pivoted2 = pd.pivot_table(df2,rows = ['ID','Count'],cols = 'Value_type',values = 'Value')
print pivoted2
The default agg function is np.mean
(even though you didn't pass it explicitly this is what is being used), which doesn't make sense on strings, in fact it raises an AttributeError when passed an object array - so pandas complains when you try to do this.
You could pass np.sum
:
In [11]: pd.pivot_table(df2, rows=['ID', 'Count'], cols='Value_type',
values='Value', aggfunc=np.sum)
Out[11]:
Value_type bar baz foo
ID Count
x 1 c NaN a
2 NaN NaN b
y 1 e f d
Or take the first item using iloc[0]
:
In [12]: pd.pivot_table(df2, rows=['ID', 'Count'], cols='Value_type',
values='Value', aggfunc=lambda x: x.iloc[0])
Out[12]:
Value_type bar baz foo
ID Count
x 1 c NaN a
2 NaN NaN b
y 1 e f d
Note: that this is the same as pivoted2['Value']
, you can make this output the same as pivoted2
if you pass a list to values to aggregate:
In [13]: pd.pivot_table(df2, rows=['ID', 'Count'], cols=['Value_type'],
values=['Value'], aggfunc=lambda x: x.iloc[0])
Out[13]:
Value
Value_type bar baz foo
ID Count
x 1 c NaN a
2 NaN NaN b
y 1 e f d
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.