简体   繁体   中英

Rearange rows in a pandas DataFrame based on MultiIndex values the pandas-way

I really want to know how this could be done in an elegant way with pandas. I have this DataFrame where the index is a MultiIndex with the two values representing a year and a catagory.

|               |   VAL |
|:--------------|------:|
| ('2018', 'A') |     0 |
| ('2018', 'B') |     1 |
| ('2018', 'C') |     2 |
| ('2019', 'A') |     0 |
| ('2019', 'B') |     1 |
| ('2019', 'C') |     2 |
| ('2020', 'A') |     0 |
| ('2020', 'B') |     1 |
| ('2020', 'C') |     2 |

I want it to look like this:

| CAT |  YEAR |  VAL |
|:----|-------|-----:|
| 'A' |  2018 |    0 |
|     |  2019 |    0 |
|     |  2020 |    0 |
| 'B' |  2018 |    1 |
|     |  2019 |    1 |
|     |  2020 |    1 |
| 'C' |  2018 |    2 |
|     |  2019 |    2 |
|     |  2020 |    2 |
  • The first column is based on the second level of the MultiIndex.
  • The second column is based on the first level of the MultiIndex.

I have no idea how pandas can help me here.

This is the MWE do produce the sample data.

#!/usr/bin/env python3
import pandas as pd

# sub DataFrame
df = pd.DataFrame(
    {
        'CAT': list('ABC'),
        'VAL': range(3)
    }
)
df = df.set_index('CAT')
print(df)

# dict of 3 DataFrame's
d = {
    '2018': df.copy(),
    '2019': df.copy(),
    '2020': df.copy(),
}

# glue them together
df = pd.concat(d)
print(df.to_markdown())
In [25]: df.swaplevel().sort_index(level=0)
Out[25]:
          VAL
CAT
A   2018    0
    2019    0
    2020    0
B   2018    1
    2019    1
    2020    1
C   2018    2
    2019    2
    2020    2

Use df.swaplevel then sort the index

If you want those actual columns (with empty strings where there's duplicates) instead of just swapping the MultiIndex, then see @Quang's answer

alternatively, building on the above


In [26]: df2 = df.swaplevel().sort_index(level=0).reset_index().rename(columns={'level_1': 'Year'})
     ...: df2['CAT'] = np.where(df2['CAT'].duplicated(), '', df2['CAT'])

In [27]: df2
Out[130]:
  CAT  Year  VAL
0   A  2018    0
1      2019    0
2      2020    0
3   B  2018    1
4      2019    1
5      2020    1
6   C  2018    2
7      2019    2
8      2020    2

You can try:

# insert the YEAR column
df.insert(0, 'YEAR', df.index.get_level_values(0))

# insert the CAT column
df.insert(0, 'CAT', df.index.get_level_values(1))

# remove the index and sort values
df = df.reset_index(drop=True).sort_values(['CAT','YEAR'])

# mask the duplicates with ''
df.loc[df['CAT'].duplicated(),'CAT'] = ''

Output:

  CAT  YEAR  VAL
0   A  2018    0
3      2019    0
6      2020    0
1   B  2018    1
4      2019    1
7      2020    1
2   C  2018    2
5      2019    2
8      2020    2

Is this what you're after?

df.reset_index().rename(columns={'level_0': 'YEAR'}).sort_values(['CAT', 'YEAR'])[['CAT', 'YEAR', 'VAL']].reset_index(drop=True)

  CAT  YEAR  VAL
0   A  2018    0
1   A  2019    0
2   A  2020    0
3   B  2018    1
4   B  2019    1
5   B  2020    1
6   C  2018    2
7   C  2019    2
8   C  2020    2

iNPUT

VAL
('2018', 'A') 0
('2018', 'B') 1
('2018', 'C') 2
('2019', 'A') 0
('2019', 'B') 1
('2019', 'C') 2
('2020', 'A') 0
('2020', 'B') 1
('2020', 'C') 2
df.insert(0, '{YEAR}', df.index.get_level_values(0))
df.insert(0, '{CAT}', df.index.get_level_values(1))
df = df.reset_index(drop=True).sort_values(['{CAT}','{YEAR}'])
df.loc[df['CAT'].duplicated(),'CAT'] = ''

OUTPUT

CAT YEAR VAL
'A' 2018 0
2019 0
2020 0
'B' 2018 1
2019 1
2020 1
'C' 2018 2
2019 2
2020 2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM