简体   繁体   中英

how to compare all values for each row in a dataframe in python

Good morning guys, my problem is simple:

Given a dataframe like this:

import pandas as pd 
  
df = pd.DataFrame({ 'a': [1, 2, 3, 4, 5, 6],
                    'b': [8, 18, 27, 20, 33, 49],
                    'c': [2, 24, 6, 16, 20, 52]})
print(df)

I would like to retrieve for each row the maximum value and compare it with all the others. If the difference is >10, create another column with a string 'yes' or 'not'

   a   b   c
0  1   8   2
1  2  18  24
2  3  27   6
3  4  20  16
4  5  33  20
5  6  49  52

I expect this result:

   a   b   c  res
0  1   8   2  not
1  2  18  24  not
2  3  27   6  yes
3  4  20  16  not
4  5  33  20  yes
5  6  49  52  not

Thanks a lot in advance.

I guess, the below code can help:

import pandas as pd

df = pd.DataFrame({ 'a': [1, 2, 3, 4, 5, 6],
                    'b': [8, 18, 27, 20, 33, 49],
                    'c': [2, 24, 6, 16, 20, 52]})

def find(x):
    if x > 10:
        return "yes"
    else:
        return "not"

df["diff"] = df.max(axis=1) - df.apply(lambda row: row.nlargest(2).values[-1],axis=1)
df["res"] = df["diff"].apply(find)
df.drop(columns="diff", axis=0, inplace=True)

Output: 在此处输入图像描述

This should do the trick.

Around twice to ten times as fast as other answers provided here

import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6],
                   'b': [8, 18, 27, 20, 33, 49],
                   'c': [2, 24, 6, 16, 20, 52]})

df["res"] = df.apply(lambda row: "yes" if all(row.apply(lambda val: max(row) - val > 10 or val == max(row))) else "not", axis=1)

print(df)

results

   a   b   c  res
0  1   8   2  not
1  2  18  24  not
2  3  27   6  yes
3  4  20  16  not
4  5  33  20  yes
5  6  49  52  not
import pandas as pd
df = pd.DataFrame({ 'a': [1, 2, 3, 4, 5, 6],
                    'b': [8, 18, 27, 20, 33, 49],
                    'c': [2, 24, 6, 16, 20, 52]})

def _max(row):
    first, second = row.nlargest(2)
    if first - second > 10:
        return True
    else:
        return False

df["res"] = df.apply(_max, axis=1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM