简体   繁体   中英

How do I take 1 column of values and put some of those values in a new column based on a boolean flag column?

Say I have the following 2 dimensional dataframe

+--------+-------------------+------------+
| Index, | Module/Line Item, | Is Module, |
+--------+-------------------+------------+
| 0,     | Module 1,         | True,      |
|--------|-------------------|------------|
| 1,     | Line Item 1,      | False,     |
|--------|-------------------|------------|
| 2,     | Line Item 2,      | False,     |
|--------|-------------------|------------|
| 3,     | Module 2,         | True,      |
|--------|-------------------|------------|
| 4,     | Line Item 1,      | False,     |
|--------|-------------------|------------|
| 5,     | Line Item 2,      | False      |
+--------+-------------------+------------+

And I want it to turn into this:

+----------+-------------+
| Module   | Line Item   |
+----------+-------------+
| Module 1 | Line Item 1 |
|          |-------------|
|          | Line Item 2 |
|----------|-------------|
| Module 2 | Line Item 1 |
|          |-------------|
|          | Line Item 2 |
+----------+-------------+

What would be the best way to accomplish that? I tried pivot_table and groupby but I couldn't get either to work the way that I wanted. Note there are not a set number of Line Items between Modules, and no patterns in the names. The "Is Module" column is the only indicator of whether the value is a module and should be pivoted. All Line items that appear beneath the module until the next module should belong to that module when pivoted.

This is not answered by How to pivot a dataframe because it never explains how to split a column into a hierarchy based on the values that are given in another column.

Use where for replace False values by Is Module by forward filling, rename columns name and last filter by boolean indexing with loc for filter also columns names:

df['Module'] = df['Module/Line Item'].where(df['Is Module']).ffill()
df = df.rename(columns={'Module/Line Item':'Line Item'})
df = df.loc[~df['Is Module'], ['Module','Line Item']]
print (df)
     Module    Line Item
1  Module 1  Line Item 1
2  Module 1  Line Item 2
4  Module 2  Line Item 1
5  Module 2  Line Item 2

If need also replace duplicated values by Module with empty values:

df['Module'] = df['Module'].mask(df['Module'].duplicated(), '')
print (df)
     Module    Line Item
1  Module 1  Line Item 1
2            Line Item 2
4  Module 2  Line Item 1
5            Line Item 2

Another solution, using groupby

df.groupby(df['Is Module'].cumsum())['Module/Line Item']\
.apply(lambda g: pd.DataFrame({'Module':g.iloc[0],
                               'Line Item': g.iloc[1:].values}))\
.set_index('Module')

            Line Item
Module  
Module 1    Line Item 1
            Line Item 2
Module 2    Line Item 1
            Line Item 2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM