简体   繁体   中英

Unstack (pivot?) dataframe in Pandas

I have a dataframe somewhat like this:

   ID | Relationship | First Name | Last Name |     DOB     |     Address   |    Phone
0 | 2 |     Self     |   Vegeta   |  Saiyan   |  01/01/1949 | Saiyan Planet | 123-456-7891
1 | 2 |     Spouse   |   Bulma    |  Saiyan   |  04/20/1969 | Saiyan Planet | 123-456-7891
2 | 3 |     Self     |   Krilin   |  Human    |  08/21/1992 | Planet Earth  | 789-456-4321
3 | 4 |     Self     |   Goku     |  Kakarot  |  05/04/1975 | Planet Earth  | 321-654-9870
4 | 4 |     Child    |   Gohan    |  Kakarot  |  04/02/2001 | Planet Earth  | 321-654-9870
5 | 5 |     Self     |   Freezer  |  Fridge   |  09/15/1955 |  Deep Space   | 456-788-9568

I'm looking to have the rows with same ID appended to the right of the first row with that ID.

Example:

   ID | Relationship | First Name | Last Name |     DOB     |     Address   |    Phone     |  Spouse_First Name |  Spouse_Last Name  |  Spouse_DOB  |  Child_First Name  |  Child_Last Name  |   Child_DOB   |
0 | 2 |     Self     |   Vegeta   |  Saiyan   |  01/01/1949 | Saiyan Planet | 123-456-7891 |      Bulma         |        Saiyan      |   04/20/1969 |                    |                   |
1 | 3 |     Self     |   Krilin   |  Human    |  08/21/1992 | Planet Earth  | 789-456-4321 |                    |                    |              |                    |                   |
2 | 4 |     Self     |   Goku     |  Kakarot  |  05/04/1975 | Planet Earth  | 321-654-9870 |                    |                    |              |        Gohan       |      Kakarot      |   04/02/2001  | 
3 | 5 |     Self     |   Freezer  |  Fridge   |  09/15/1955 |  Deep Space   | 456-788-9568 |                    |                    |              |                    |                   |

My real scenario dataframe has more columns, but they all have the same information when the two rows share the same ID, so no need to duplicate those in the other rows. I only need to add to the right the columns that I choose, which in this case would be First Name, Last Name, DOB with the identifier for the new column label depending on what's on the 'Relationship' column (I can rename them later if it's not possible to do in a straight way, just wanted to illustrate my point.

Now that I've said this, I want to add that I have tried different ways and seems like approaching with unstack or pivot is the way to go but I have not been successful in making it work.

Any help would be greatly appreciated.

This solution assumes that the DataFrame is indexed by the ID column.

not_self = (
    df.query("Relationship != 'Self'")
    .pivot(columns='Relationship')
    .swaplevel(axis=1)
    .reindex(
        pd.MultiIndex.from_product(
            (
                set(df['Relationship'].unique()) - {'Self'}, 
                df.columns.to_series().drop('Relationship')
            )
        ),
        axis=1
    )
)
not_self.columns = [' '.join((a, b)) for a, b in not_self.columns]
result = df.query("Relationship == 'Self'").join(not_self)

Please let me know if this is not what was wanted.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM