Assume that I have the following data frame df1
:
A B C D
0 foo one 1 0
1 bar two 2 1
2 foo two 3 0
3 bar two 4 1
4 foo two 5 0
5 bar two 6 1
6 foo one 7 0
7 foo two 8 1
I would like to turn it into a dataframe df2
like this:
A B C D
foo [one,two] [1,3,5,7,8] 0
bar [two] [2,4,6] 1
More precisely:
grouped by A
, ie column A
is the index and in every row the value of A
is unique
column B
and C
contain the aggregate set of values that occur. For A = "foo"
, B
was either "one"
or "two"
, while for "bar"
it was only "two"
.
set
, but I am also asking what the most elegant way is to represent this with pandas column D
does not contain sets, because for foo
D
is always 0 and for bar
it is always 1. If there is always a 1:1 relationship between the index value and a column value, then the column should not contain sets.
I expected there to be a one-line aggregation a la df1.groupby("A").aggregate_like_this()
, but I had no luck so far finding it.
Use groupby
+ agg
:
f = {'B' : lambda x: np.unique(x).tolist(),
'C' : lambda x: np.unique(x).tolist(),
'D' : 'first'
}
df.groupby('A', as_index=False).agg(f).reindex(columns=df.columns)
A B C D
0 bar [two] [2, 4, 6] 1
1 foo [one, two] [1, 3, 5, 7, 8] 0
If you cannot determine in advance what values of A
have a 1:1 relationship with D
, check so with groupby
+ nunique
and then filter your dataset accordingly.
x = df.groupby('A').D.nunique().eq(1)
df = df[df.A.isin(x[x].index)]
df
A B C D
1 bar two 2 1
3 bar two 4 1
5 bar two 6 1
df.groupby('A', as_index=False).agg(f).reindex(columns=df.columns)
A B C D
0 bar [two] [2, 4, 6] 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.