How do I take 1 column of values and put some of those values in a new column based on a boolean flag column?

Question

Say I have the following 2 dimensional dataframe

+--------+-------------------+------------+
| Index, | Module/Line Item, | Is Module, |
+--------+-------------------+------------+
| 0,     | Module 1,         | True,      |
|--------|-------------------|------------|
| 1,     | Line Item 1,      | False,     |
|--------|-------------------|------------|
| 2,     | Line Item 2,      | False,     |
|--------|-------------------|------------|
| 3,     | Module 2,         | True,      |
|--------|-------------------|------------|
| 4,     | Line Item 1,      | False,     |
|--------|-------------------|------------|
| 5,     | Line Item 2,      | False      |
+--------+-------------------+------------+

And I want it to turn into this:

+----------+-------------+
| Module   | Line Item   |
+----------+-------------+
| Module 1 | Line Item 1 |
|          |-------------|
|          | Line Item 2 |
|----------|-------------|
| Module 2 | Line Item 1 |
|          |-------------|
|          | Line Item 2 |
+----------+-------------+

What would be the best way to accomplish that? I tried pivot_table and groupby but I couldn't get either to work the way that I wanted. Note there are not a set number of Line Items between Modules, and no patterns in the names. The "Is Module" column is the only indicator of whether the value is a module and should be pivoted. All Line items that appear beneath the module until the next module should belong to that module when pivoted.

This is not answered by How to pivot a dataframe because it never explains how to split a column into a hierarchy based on the values that are given in another column.

Answer 1

Use where for replace False values by Is Module by forward filling, rename columns name and last filter by boolean indexing with loc for filter also columns names:

df['Module'] = df['Module/Line Item'].where(df['Is Module']).ffill()
df = df.rename(columns={'Module/Line Item':'Line Item'})
df = df.loc[~df['Is Module'], ['Module','Line Item']]
print (df)
     Module    Line Item
1  Module 1  Line Item 1
2  Module 1  Line Item 2
4  Module 2  Line Item 1
5  Module 2  Line Item 2

If need also replace duplicated values by Module with empty values:

df['Module'] = df['Module'].mask(df['Module'].duplicated(), '')
print (df)
     Module    Line Item
1  Module 1  Line Item 1
2            Line Item 2
4  Module 2  Line Item 1
5            Line Item 2

Answer 2

Another solution, using groupby

df.groupby(df['Is Module'].cumsum())['Module/Line Item']\
.apply(lambda g: pd.DataFrame({'Module':g.iloc[0],
                               'Line Item': g.iloc[1:].values}))\
.set_index('Module')

            Line Item
Module  
Module 1    Line Item 1
            Line Item 2
Module 2    Line Item 1
            Line Item 2

How do I take 1 column of values and put some of those values in a new column based on a boolean flag column?

Question

2 answers

solution1
2 ACCPTED 2018-11-04 18:37:00

solution2
1 2018-11-04 19:33:08

How do I take 1 column of values and put some of those values in a new column based on a boolean flag column?

Question

2 answers

solution1 2 ACCPTED 2018-11-04 18:37:00

solution2 1 2018-11-04 19:33:08

solution1
2 ACCPTED 2018-11-04 18:37:00

solution2
1 2018-11-04 19:33:08