简体   繁体   中英

Correct way to assign values to a dataframe column based on the values of other columns

I have a dataframe the looks something like this:

   a     b      c
0  A   1.0   10.0
1  B   2.0   20.0
2  C   3.0   30.0
3  A   4.0   40.0
4  B   5.0   50.0
5  C   6.0   60.0
6  A   7.0   70.0
7  B   8.0   80.0
8  C   9.0   90.0
9  A  10.0  100.0

I want to create a column 'd' whose value depends on 'a' so that if the value of column 'a' is in ['A','B'] then column 'd' gets the value in 'b' or else it gets the value in 'c'. The result I want is:

   a     b      c     d
0  A   1.0   10.0   1.0
1  B   2.0   20.0   2.0
2  C   3.0   30.0  30.0
3  A   4.0   40.0   4.0
4  B   5.0   50.0   5.0
5  C   6.0   60.0  60.0
6  A   7.0   70.0   7.0
7  B   8.0   80.0   8.0
8  C   9.0   90.0  90.0
9  A  10.0  100.0  10.0

I have tried:

df["d"] = np.nan

for i in range(df.shape[0]):
    if df.a.iloc[i] in ['A','B']:
        df.d.iloc[i] = df.b.iloc[i]
    elif df.a.iloc[i] in ['C']:
        df.d.iloc[i] = df.c.iloc[i]

This gives me the answer that I want, but I get the error, "SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame"

I also know that a for loop is not ideal, so I tried to do this using a boolean mask, but

print(df.a in ['A','B'])

Gives me the warning, "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."

What is the best way to a) fix the for loop or b) replace the for loop with something more elegant? I have spent an hour going through SO, but I can't find a good answer for my specific problem. Any help is appreciated.

You can usenp.where :

In [1696]: df['d'] = np.where(df['a'].isin(['A', 'B']), df['b'], df['c'])     
In [1697]: df 
Out[1697]: 
   a     b      c     d
0  A   1.0   10.0   1.0
1  B   2.0   20.0   2.0
2  C   3.0   30.0  30.0
3  A   4.0   40.0   4.0
4  B   5.0   50.0   5.0
5  C   6.0   60.0  60.0
6  A   7.0   70.0   7.0
7  B   8.0   80.0   8.0
8  C   9.0   90.0  90.0
9  A  10.0  100.0  10.0

You can use isin and np.select :

df['d'] = np.select( (df.a.isin(['A','B']), df.a.eq('C')),
                    (df.b, df.c), np.nan)

In the case where a column consists of values A,B,C only as shown in the sample data, you can simply use np.where :

df['d'] = np.where(df.a.isin(['A','B']), df.b, df.c)

# or
# df['d'] = np.where(df.a.eq('C'), df.c, df.b)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM