I have a dataframe that has duplicated rows simply because two columns are different from each other.
df
[A] [B] [C] [D] [E]
123 X Y 5 A
135 D E 4 B
434 R F 3 C
434 E Z 5 C
In the above example, column [A]
should have unique values and is my key to determining duplicated rows. As shown, column [A]
shows a repeat at 434 due to [B]
and [C]
containing different objects. As a result, column [D]
is being split from 8 to 3 and 5 for each row and [E]
is being repeated. (Column [D] is an arbitrary split based on other factors that aren't important to this example)
My goal is to drop the two columns causing the duplication and then aggregating columns [A]
, [D]
, and [E]
. Is there a way I can use .groupby()
and set rules for aggregating non-integer values (for column [E]
? Aggregate is probably not the best word as I'm simply just taking the repeated instance and brings it up a level. I'm thinking for column [E]
setting rules where it outputs the first instance since both are unchanging.
I started off with the following method in mind: df.groupby('A').agg()
The example's output should show:
df_agg
[A] [D] [E]
123 5 A
135 4 B
434 8 C
This is as simple as a groupby
+ agg
-
df.groupby('[A]', as_index=False).agg({'[D]' : sum, '[E]' : 'first'})
[A] [D] [E]
0 123 5 A
1 135 4 B
2 434 8 C
If [A]
is the index, then change the groupby
syntax a bit -
df.groupby(level=0).agg({'[D]' : sum, '[E]' : 'first'})
[D] [E]
[A]
123 5 A
135 4 B
434 8 C
Use, groupby
with agg
and a dictionary defined how to aggregate the columns.
df.groupby('[A]').agg({'[D]':'sum','[E]':'first'}).reset_index()
Output:
[A] [D] [E]
0 123 5 A
1 135 4 B
2 434 8 C
With this :-), then just select what you need from the result
df.groupby('[A]',as_index=False).agg(lambda x : x.head(1) if x.dtype=='object' else x.sum())
Out[275]:
[A] [B] [C] [D] [E]
0 123 X Y 5 A
1 135 D E 4 B
2 434 R F 8 C
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.