简体   繁体   中英

PANDAS NEW COLUMN BASED ON MULTIPLE CRITERIA AND COLUMNS

I want to create a new columns for a big table using several criteria and columsn and was not sure the best way to approach it.

    df = pd.DataFrame({'a': ['A', "B", "B", "C", "D"],
'b':['y','n','y','n', np.nan], 'c':[10,20,10,40,30], 'd':[.3,.1,.4,.2, .1]})
    df.head()

    def fun(df=df):
        df=df.copy()
        if df.a=='A' & df.b =='n': 
            df['new_Col'] = df.c+df.d
        if df.a=='A' & df.b =='y': 
            df['new_Col'] = df.d *2
        else:
            df['new_Col'] = 0
        return df
    fun()

OR


    def fun(df=df):
            df=df.copy()
            if df.a=='A' & df.b =='n': 
                return = df.c+df.d
            if df.a=='A' & df.b =='y': 
                return  df.d *2
            else:
                return 0
    df['new_Col"] df.apply(fun)

OR using np.where :

    df['new_Col'] = np.where(df.a=='A' & df.b =='n', df.c+df.d,0 )
    df['new_Col'] = np.where(df.a=='A' & df.b =='y', df.d *2,0 )

Looks like you need np.select

a, n, y = df.a.eq('A'), df.b.eq('n'), df.b.eq('y')

df['result'] = np.select([a & n, a & y], [df.c + df.d, df.d*2], default=0)

This is an arithmetic way ( I added one more row to your sample for case a = 'A' and b = 'n' ):

sample

Out[1369]:
   a    b   c    d
0  A  y    10  0.3
1  B  n    20  0.1
2  B  y    10  0.4
3  C  n    40  0.2
4  D  NaN  30  0.1
5  A  n    50  0.9

nc = df.a.eq('A') & df.b.eq('y')
mc = df.a.eq('A') & df.b.eq('n')
nr = df.d * 2
mr = df.c + df.d

df['new_col'] = nc*nr + mc*mr

Out[1371]:
   a    b   c    d  new_col
0  A  y    10  0.3  0.6
1  B  n    20  0.1  0.0
2  B  y    10  0.4  0.0
3  C  n    40  0.2  0.0
4  D  NaN  30  0.1  0.0
5  A  n    50  0.9  50.9

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM