简体   繁体   中英

Pandas group double observations by aggregating column

I have a dataframe like this:

+----------+---------+
| username | role    |
+----------+---------+
| foo      | user    |
+----------+---------+
| foo      | analyst |
+----------+---------+
| bar      | admin   |
+----------+---------+

and I would like to remove the repetition of the users that appear twice or more by aggregating the column role in a way to obtain the following dataframe:

+----------+---------------+
| username | role          |
+----------+---------------+
| foo      | user, analyst |
+----------+---------------+
| bar      | admin         |
+----------+---------------+

So far I have tried using pivot table in this way:

table = pd.pivot_table(df, index='username', columns='role')

and also the groupby function, but this is not the right way to do it. What is the right way to deal with this?

What you want to do is group the rows based on username , so the groupby -function is one way to go. Usually when you use groupby you apply an aggregation function to the rest of the columns, for example sum , average , min or similair. But you can also define your own aggregation function, and use that in agg .

def merge_strings(series):
    # This function will get a series of all the values in a column. For example for foo the series will be ['user', 'analyst'].
    # We can use the built in function str.cat() fo contatenate a series of strings.

    return series.str.cat(sep=', ')

Then we simply call groupby, and tell that we want to aggregate the role -column using our custom function

df.groupby('username').agg({'role': merge_strings})

You can create a list or comma separate strings using the following:

df.groupby('username')['role'].agg(list).reset_index()

Output:

  username             role
0      bar          [admin]
1      foo  [user, analyst]

OR

df.groupby('username')['role'].agg(lambda x: ', '.join(x)).reset_index()

OUtput:

  username           role
0      bar          admin
1      foo  user, analyst

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM