简体   繁体   中英

Pandas reshaping

I want to reshape a Pandas dataframe to have a new multi-index based on a combination of some of the original columns, and at the same time unstack some of the rows. But I don't know how even after reading the tutorial on stacking and pivoting .

Basically, I have:

# fruit      year, variable, value
fruits = \ 
 [('apples' , 2014, 'weight', 1.4),
 ('apples' , 2015, 'weight', 1.5),
 ('bananas', 2014, 'yield', 0.5),
 ('bananas', 2015, 'yield', 0.6),
 ('bananas', 2014, 'weight', 1.4)]
df = DataFrame(fruits)

The result should be:

 multi-index
/----------\
fruit   year   weight yield
apples  2014   1.4    NaN
        2015   1.5    NaN
bananas 2014   1.4    0.5
        2015   NaN    0.6

Any suggestions? Thanks.

The original dataframe has a column with values weight or yield . We want these to be column names (aka "column level values").

set_index can move column values into index level values. unstack can move index level values into column level values.

Put the two together and we get:

fruits = \ 
 [('apples' , 2014, 'weight', 1.4),
 ('apples' , 2015, 'weight', 1.5),
 ('bananas', 2014, 'yield', 0.5),
 ('bananas', 2015, 'yield', 0.6),
 ('bananas', 2014, 'weight', 1.4)]
df = pd.DataFrame(fruits, columns='fruit year col val'.split())
df = df.set_index(['fruit', 'year', 'col'])
df = df.unstack(level='col')
df.columns = df.columns.droplevel(0)

which yields

col           weight  yield
fruit   year               
apples  2014     1.4    NaN
        2015     1.5    NaN
bananas 2014     1.4    0.5
        2015     NaN    0.6

Another option is to use pivot_table :

df = df.pivot_table(index=['fruit', 'year'], columns='col')
df.columns = df.columns.droplevel(0)

First create the DataFrame using the list fruits and label the columns accordingly:

>>> df = pd.DataFrame(fruits, columns=['fruit', 'year', 'var', 'val'])
>>> df
     fruit  year     var  val
0   apples  2014  weight  1.4
1   apples  2015  weight  1.5
2  bananas  2014   yield  0.5
3  bananas  2015   yield  0.6
4  bananas  2014  weight  1.4

Multi-index with the function pivot_table (nb. the order of the elements in the list index is important):

>>> df1 = pd.pivot_table(df, values='val', index=['fruit', 'year'], columns='var')
var           weight  yield
fruit   year               
apples  2014     1.4    NaN
        2015     1.5    NaN
bananas 2014     1.4    0.5
        2015     NaN    0.6

If you don't want the 'var' then df1.columns=['weight', 'yield'] gets rid of it:

>>> df1
              weight  yield
fruit   year               
apples  2014     1.4    NaN
        2015     1.5    NaN
bananas 2014     1.4    0.5
        2015     NaN    0.6

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM