I'm trying to find an efficient way to determine in a DataFrame which row have the highest value in a column (value) when their "String" in another column (String) are identical, creating a new column (motif) with this information for later use.
Here an example of a dataframe:
String N value
0 EXAM 10 250
1 EXAMP 20 350
2 EXAMPLE 30 450
3 EXAMPLE 40 400
4 EXA 50 300
5 EX 60 100
Here is what I'm looking for:
String N value motif
0 EXAM 10 250 Nan
1 EXAMP 20 350 Nan
2 EXAMPLE 30 450 1
3 EXAMPLE 40 400 Nan
4 EXA 50 300 Nan
5 EX 60 100 Nan
I tried to work with a split apply combine method
def group_motif(df):
if df.groupby(['String']).size() > 1:
"something like for row with the highest value in column ['value']":
"create a new column in df called ['motif'] and add value = 1 in the row
Then I was thinking of doing a groupby.apply
of this function and then combine the different groups but I can't get it right.
Is there an efficient way to achieve that other than using groupby
?
IIUC then you can groupby
on 'String', filter
it and then call idxmax
to return the row labels that have the max value and assign those rows to 1
:
In [201]:
df.loc[df.groupby('String').filter(lambda x: len(x) > 1)['value'].idxmax(), 'motif'] = 1
df
Out[201]:
String N value motif
0 EXAM 10 250 NaN
1 EXAMP 20 350 NaN
2 EXAMPLE 30 450 1
3 EXAMPLE 40 400 NaN
4 EXA 50 300 NaN
5 EX 60 100 NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.