简体   繁体   中英

For Pandas Dataframe is there a way to display same category together as one while retaining all the other values?

For Pandas Dataframe is there a way to display same category together as one while retaining all the other values in string?

Assuming I have the following Scenario:

pd.DataFrame({"category": ['Associates', 'Manager', 'Associates', 'Associates', 'Engineer', 'Engineer', 'Manager', 'Engineer'],
              "name": ['Abby', 'Jenny', 'Thomas', 'John', 'Eve', 'Danny', 'Kenny', 'Helen'],
              "email": ['Abby@email.com', 'Jenny@email.com', 'Thomas@email.com', 'John@email.com', 'Eve@email.com', 'Danny@email.com', 'Kenny@email.com', 'Helen@email.com']})

How can I attempt to display the dataframe in a this way?

Output:

category     name     email
Associates   Abby     Abby@email.com
             Thomas   Thomas@email.com
             John     John@email.com
Manager      Jenny    Jenny@email.com
             Kenny    Kenny@email.com
Engineer     Eve      Eve@email.com
             Danny    Danny@email.com
             Helen    Helen@email.com

Any advise, or can it be done with groupby functions? Thanks!

It's not really clear to me what you mean by display . To get a print similar (not exactly) like the one you are showing you don't need .groupby() . Just do

df = df.set_index(["category", "name"]).sort_index()

and get

                              email
category   name                    
Associates Abby      Abby@email.com
           John      John@email.com
           Thomas  Thomas@email.com
Engineer   Danny    Danny@email.com
           Eve        Eve@email.com
           Helen    Helen@email.com
Manager    Jenny    Jenny@email.com
           Kenny    Kenny@email.com

If you really want to modify the columns, then you could try something like

df = df.sort_values(["category", "name"], ignore_index=True)
df.loc[df["category"] == df["category"].shift(), "category"] = ""

to get

     category    name             email
0  Associates    Abby    Abby@email.com
1                John    John@email.com
2              Thomas  Thomas@email.com
3    Engineer   Danny   Danny@email.com
4                 Eve     Eve@email.com
5               Helen   Helen@email.com
6     Manager   Jenny   Jenny@email.com
7               Kenny   Kenny@email.com

For this, you will have two line of codes: First, you need to set both your category and name as index

df.set_index(['category','name'],inplace=True)

Next, you will use groupby.sum to get your desired output.

df.groupby(level=[0,1]).sum()
Out[67]: 
                              email
category   name                    
Associates Abby      Abby@email.com
           John      John@email.com
           Thomas  Thomas@email.com
Engineer   Danny    Danny@email.com
           Eve        Eve@email.com
           Helen    Helen@email.com
Manager    Jenny    Jenny@email.com
           Kenny    Kenny@email.com

For this, you can use groupby() function. Showing below is the sample code.

df.groupby(['category','name']).max()

Now the data is in indexed format and will be in the same format that you mentioned, if you want to remove the index, use the below code

df.groupby(['category','name']).max().reset_index()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM