Correct way to assign values to a dataframe column based on the values of other columns

Question

I have a dataframe the looks something like this:

   a     b      c
0  A   1.0   10.0
1  B   2.0   20.0
2  C   3.0   30.0
3  A   4.0   40.0
4  B   5.0   50.0
5  C   6.0   60.0
6  A   7.0   70.0
7  B   8.0   80.0
8  C   9.0   90.0
9  A  10.0  100.0

I want to create a column 'd' whose value depends on 'a' so that if the value of column 'a' is in ['A','B'] then column 'd' gets the value in 'b' or else it gets the value in 'c'. The result I want is:

   a     b      c     d
0  A   1.0   10.0   1.0
1  B   2.0   20.0   2.0
2  C   3.0   30.0  30.0
3  A   4.0   40.0   4.0
4  B   5.0   50.0   5.0
5  C   6.0   60.0  60.0
6  A   7.0   70.0   7.0
7  B   8.0   80.0   8.0
8  C   9.0   90.0  90.0
9  A  10.0  100.0  10.0

I have tried:

df["d"] = np.nan

for i in range(df.shape[0]):
    if df.a.iloc[i] in ['A','B']:
        df.d.iloc[i] = df.b.iloc[i]
    elif df.a.iloc[i] in ['C']:
        df.d.iloc[i] = df.c.iloc[i]

This gives me the answer that I want, but I get the error, "SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame"

I also know that a for loop is not ideal, so I tried to do this using a boolean mask, but

print(df.a in ['A','B'])

Gives me the warning, "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."

What is the best way to a) fix the for loop or b) replace the for loop with something more elegant? I have spent an hour going through SO, but I can't find a good answer for my specific problem. Any help is appreciated.

Answer 1

You can usenp.where :

In [1696]: df['d'] = np.where(df['a'].isin(['A', 'B']), df['b'], df['c'])     
In [1697]: df 
Out[1697]: 
   a     b      c     d
0  A   1.0   10.0   1.0
1  B   2.0   20.0   2.0
2  C   3.0   30.0  30.0
3  A   4.0   40.0   4.0
4  B   5.0   50.0   5.0
5  C   6.0   60.0  60.0
6  A   7.0   70.0   7.0
7  B   8.0   80.0   8.0
8  C   9.0   90.0  90.0
9  A  10.0  100.0  10.0

Answer 2

You can use isin and np.select :

df['d'] = np.select( (df.a.isin(['A','B']), df.a.eq('C')),
                    (df.b, df.c), np.nan)

In the case where a column consists of values A,B,C only as shown in the sample data, you can simply use np.where :

df['d'] = np.where(df.a.isin(['A','B']), df.b, df.c)

# or
# df['d'] = np.where(df.a.eq('C'), df.c, df.b)

Correct way to assign values to a dataframe column based on the values of other columns

Question

2 answers

solution1
3 ACCPTED 2020-05-25 15:32:25

solution2
2 2020-05-25 15:31:03

Correct way to assign values to a dataframe column based on the values of other columns

Question

2 answers

solution1 3 ACCPTED 2020-05-25 15:32:25

solution2 2 2020-05-25 15:31:03

solution1
3 ACCPTED 2020-05-25 15:32:25

solution2
2 2020-05-25 15:31:03