Setting Values in Pandas Dataframe Based on Condition in Another Column

Question

I am looking to update the values in a pandas series that satisfy a certain condition and take the corresponding value from another column.

Specifically, I want to look at the subcluster column and if the value equals 1, I want the record to update to the corresponding value in the cluster column.

For example:

Cluster	Subcluster
3	1
3	2
3	1
3	4
4	1
4	2

Should result in this

Cluster	Subcluster
3	3
3	2
3	3
3	4
4	4
4	2

I've been trying to use apply and a lambda function, but can't seem to get it to work properly. Any advice would be greatly appreciated. Thanks!

Answer 1

You can use np.where :

import numpy as np

df['Subcluster'] = np.where(df['Subcluster'].eq(1), df['Cluster'], df['Subcluster'])

Output:

    Cluster  Subcluster
0         3           3
1         3           2
2         3           3
3         3           4
4         4           4
5         4           2

Answer 2

In your case try mask

df.Subcluster.mask(lambda x : x==1, df.Cluster,inplace=True)
df
Out[12]: 
   Cluster  Subcluster
0        3           3
1        3           2
2        3           3
3        3           4
4        4           4
5        4           2

Or

df.loc[df.Subcluster==1,'Subcluster'] = df['Cluster']

Answer 3

Really all you need here is to use .loc with a mask (you don't actually need to create the mask, you could apply a mask inline)

df = pd.DataFrame({'cluster':np.random.randint(0,10,10)
                    ,'subcluster':np.random.randint(0,3,10)}
                 )
df.to_clipboard(sep=',')

df at this point

,cluster,subcluster
0,8,0
1,5,2
2,6,2
3,6,1
4,8,0
5,1,1
6,0,0
7,6,0
8,1,0
9,3,1

create and apply the mask (you could do this all in one line)

mask = df.subcluster == 1
df.loc[mask,'subcluster'] = df.loc[mask,'cluster']
df.to_clipboard(sep=',')

final output:

,cluster,subcluster
0,8,0
1,5,2
2,6,2
3,6,6
4,8,0
5,1,1
6,0,0
7,6,0
8,1,0
9,3,3

Answer 4

Here's the lambda you couldn't write. In lamba, x corresponds to the index, so you can use that to refer a specific row in a column.

df['Subcluster'] = df.apply(lambda x: x['Cluster'] if x['Subcluster'] == 1 else x['Subcluster'], axis = 1)

And the output:

    Cluster Subcluster
0   3       3
1   3       2
2   3       3
3   3       4
4   4       4
5   4       2

Setting Values in Pandas Dataframe Based on Condition in Another Column

Question

4 answers

solution1
2 2021-07-07 14:53:43

solution2
1 ACCPTED 2021-07-07 14:56:41

solution3
0 2021-07-07 14:56:37

solution4
0 2021-07-07 15:17:09

Setting Values in Pandas Dataframe Based on Condition in Another Column

Question

4 answers

solution1 2 2021-07-07 14:53:43

solution2 1 ACCPTED 2021-07-07 14:56:41

solution3 0 2021-07-07 14:56:37

solution4 0 2021-07-07 15:17:09

solution1
2 2021-07-07 14:53:43

solution2
1 ACCPTED 2021-07-07 14:56:41

solution3
0 2021-07-07 14:56:37

solution4
0 2021-07-07 15:17:09