Say I have the following 2 dimensional dataframe
+--------+-------------------+------------+
| Index, | Module/Line Item, | Is Module, |
+--------+-------------------+------------+
| 0, | Module 1, | True, |
|--------|-------------------|------------|
| 1, | Line Item 1, | False, |
|--------|-------------------|------------|
| 2, | Line Item 2, | False, |
|--------|-------------------|------------|
| 3, | Module 2, | True, |
|--------|-------------------|------------|
| 4, | Line Item 1, | False, |
|--------|-------------------|------------|
| 5, | Line Item 2, | False |
+--------+-------------------+------------+
And I want it to turn into this:
+----------+-------------+
| Module | Line Item |
+----------+-------------+
| Module 1 | Line Item 1 |
| |-------------|
| | Line Item 2 |
|----------|-------------|
| Module 2 | Line Item 1 |
| |-------------|
| | Line Item 2 |
+----------+-------------+
What would be the best way to accomplish that? I tried pivot_table and groupby but I couldn't get either to work the way that I wanted. Note there are not a set number of Line Items between Modules, and no patterns in the names. The "Is Module" column is the only indicator of whether the value is a module and should be pivoted. All Line items that appear beneath the module until the next module should belong to that module when pivoted.
This is not answered by How to pivot a dataframe because it never explains how to split a column into a hierarchy based on the values that are given in another column.
Use where
for replace False
values by Is Module
by forward filling, rename columns name and last filter by boolean indexing
with loc
for filter also columns names:
df['Module'] = df['Module/Line Item'].where(df['Is Module']).ffill()
df = df.rename(columns={'Module/Line Item':'Line Item'})
df = df.loc[~df['Is Module'], ['Module','Line Item']]
print (df)
Module Line Item
1 Module 1 Line Item 1
2 Module 1 Line Item 2
4 Module 2 Line Item 1
5 Module 2 Line Item 2
If need also replace duplicated values by Module
with empty values:
df['Module'] = df['Module'].mask(df['Module'].duplicated(), '')
print (df)
Module Line Item
1 Module 1 Line Item 1
2 Line Item 2
4 Module 2 Line Item 1
5 Line Item 2
Another solution, using groupby
df.groupby(df['Is Module'].cumsum())['Module/Line Item']\
.apply(lambda g: pd.DataFrame({'Module':g.iloc[0],
'Line Item': g.iloc[1:].values}))\
.set_index('Module')
Line Item
Module
Module 1 Line Item 1
Line Item 2
Module 2 Line Item 1
Line Item 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.