简体   繁体   中英

Grouping pandas dataframe and collecting multiple values into sets

Assume that I have the following data frame df1 :

     A    B  C   D 
0  foo  one  1  0
1  bar  two  2  1
2  foo  two  3  0
3  bar  two  4  1
4  foo  two  5  0
5  bar  two  6  1
6  foo  one  7  0
7  foo  two  8  1

I would like to turn it into a dataframe df2 like this:

A     B            C                 D             
foo  [one,two]  [1,3,5,7,8]          0
bar  [two]          [2,4,6]          1

More precisely:

  • grouped by A , ie column A is the index and in every row the value of A is unique

  • column B and C contain the aggregate set of values that occur. For A = "foo" , B was either "one" or "two" , while for "bar" it was only "two" .

    • logically, this should be a set with every value that occurs being present exactly once. It could be a Python set , but I am also asking what the most elegant way is to represent this with pandas
  • column D does not contain sets, because for foo D is always 0 and for bar it is always 1. If there is always a 1:1 relationship between the index value and a column value, then the column should not contain sets.

I expected there to be a one-line aggregation a la df1.groupby("A").aggregate_like_this() , but I had no luck so far finding it.

Use groupby + agg :

f = {'B' : lambda x: np.unique(x).tolist(), 
     'C' : lambda x: np.unique(x).tolist(), 
     'D' : 'first'
}

df.groupby('A', as_index=False).agg(f).reindex(columns=df.columns)

     A           B                C  D
0  bar       [two]        [2, 4, 6]  1
1  foo  [one, two]  [1, 3, 5, 7, 8]  0 

If you cannot determine in advance what values of A have a 1:1 relationship with D , check so with groupby + nunique and then filter your dataset accordingly.

x = df.groupby('A').D.nunique().eq(1)
df = df[df.A.isin(x[x].index)]
df

     A    B  C  D
1  bar  two  2  1
3  bar  two  4  1
5  bar  two  6  1

df.groupby('A', as_index=False).agg(f).reindex(columns=df.columns)

     A      B          C  D
0  bar  [two]  [2, 4, 6]  1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM