I want to reshape a Pandas dataframe to have a new multi-index based on a combination of some of the original columns, and at the same time unstack some of the rows. But I don't know how even after reading the tutorial on stacking and pivoting .
Basically, I have:
# fruit year, variable, value
fruits = \
[('apples' , 2014, 'weight', 1.4),
('apples' , 2015, 'weight', 1.5),
('bananas', 2014, 'yield', 0.5),
('bananas', 2015, 'yield', 0.6),
('bananas', 2014, 'weight', 1.4)]
df = DataFrame(fruits)
The result should be:
multi-index
/----------\
fruit year weight yield
apples 2014 1.4 NaN
2015 1.5 NaN
bananas 2014 1.4 0.5
2015 NaN 0.6
Any suggestions? Thanks.
The original dataframe has a column with values weight
or yield
. We want these to be column names (aka "column level values").
set_index
can move column values into index level values. unstack
can move index level values into column level values.
Put the two together and we get:
fruits = \
[('apples' , 2014, 'weight', 1.4),
('apples' , 2015, 'weight', 1.5),
('bananas', 2014, 'yield', 0.5),
('bananas', 2015, 'yield', 0.6),
('bananas', 2014, 'weight', 1.4)]
df = pd.DataFrame(fruits, columns='fruit year col val'.split())
df = df.set_index(['fruit', 'year', 'col'])
df = df.unstack(level='col')
df.columns = df.columns.droplevel(0)
which yields
col weight yield
fruit year
apples 2014 1.4 NaN
2015 1.5 NaN
bananas 2014 1.4 0.5
2015 NaN 0.6
Another option is to use pivot_table
:
df = df.pivot_table(index=['fruit', 'year'], columns='col')
df.columns = df.columns.droplevel(0)
First create the DataFrame
using the list fruits
and label the columns accordingly:
>>> df = pd.DataFrame(fruits, columns=['fruit', 'year', 'var', 'val'])
>>> df
fruit year var val
0 apples 2014 weight 1.4
1 apples 2015 weight 1.5
2 bananas 2014 yield 0.5
3 bananas 2015 yield 0.6
4 bananas 2014 weight 1.4
Multi-index with the function pivot_table
(nb. the order of the elements in the list index
is important):
>>> df1 = pd.pivot_table(df, values='val', index=['fruit', 'year'], columns='var')
var weight yield
fruit year
apples 2014 1.4 NaN
2015 1.5 NaN
bananas 2014 1.4 0.5
2015 NaN 0.6
If you don't want the 'var'
then df1.columns=['weight', 'yield']
gets rid of it:
>>> df1
weight yield
fruit year
apples 2014 1.4 NaN
2015 1.5 NaN
bananas 2014 1.4 0.5
2015 NaN 0.6
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.