[英]Setting Values in Pandas Dataframe Based on Condition in Another Column
I am looking to update the values in a pandas series that satisfy a certain condition and take the corresponding value from another column.我希望更新满足特定条件的熊猫系列中的值,并从另一列中获取相应的值。
Specifically, I want to look at the subcluster
column and if the value equals 1, I want the record to update to the corresponding value in the cluster
column.具体来说,我想查看subcluster
cluster
列,如果值等于 1,我希望记录更新为cluster
列中的相应值。
For example:例如:
Cluster簇 | Subcluster子集群 |
---|---|
3 3 | 1 1 |
3 3 | 2 2 |
3 3 | 1 1 |
3 3 | 4 4 |
4 4 | 1 1 |
4 4 | 2 2 |
Should result in this应该导致这个
Cluster簇 | Subcluster子集群 |
---|---|
3 3 | 3 3 |
3 3 | 2 2 |
3 3 | 3 3 |
3 3 | 4 4 |
4 4 | 4 4 |
4 4 | 2 2 |
I've been trying to use apply and a lambda function, but can't seem to get it to work properly.我一直在尝试使用 apply 和 lambda 函数,但似乎无法正常工作。 Any advice would be greatly appreciated.任何建议将不胜感激。 Thanks!谢谢!
You can use np.where
:您可以使用np.where
:
import numpy as np
df['Subcluster'] = np.where(df['Subcluster'].eq(1), df['Cluster'], df['Subcluster'])
Output:输出:
Cluster Subcluster
0 3 3
1 3 2
2 3 3
3 3 4
4 4 4
5 4 2
In your case try mask
在你的情况下尝试mask
df.Subcluster.mask(lambda x : x==1, df.Cluster,inplace=True)
df
Out[12]:
Cluster Subcluster
0 3 3
1 3 2
2 3 3
3 3 4
4 4 4
5 4 2
Or或者
df.loc[df.Subcluster==1,'Subcluster'] = df['Cluster']
Really all you need here is to use .loc with a mask (you don't actually need to create the mask, you could apply a mask inline)在这里,您真正需要的只是将 .loc 与掩码一起使用(您实际上不需要创建掩码,您可以内联应用掩码)
df = pd.DataFrame({'cluster':np.random.randint(0,10,10)
,'subcluster':np.random.randint(0,3,10)}
)
df.to_clipboard(sep=',')
df
at this point df
此时
,cluster,subcluster
0,8,0
1,5,2
2,6,2
3,6,1
4,8,0
5,1,1
6,0,0
7,6,0
8,1,0
9,3,1
create and apply the mask (you could do this all in one line)创建并应用蒙版(您可以在一行中完成所有操作)
mask = df.subcluster == 1
df.loc[mask,'subcluster'] = df.loc[mask,'cluster']
df.to_clipboard(sep=',')
final output:最终输出:
,cluster,subcluster
0,8,0
1,5,2
2,6,2
3,6,6
4,8,0
5,1,1
6,0,0
7,6,0
8,1,0
9,3,3
Here's the lambda you couldn't write.这是您无法编写的 lambda。 In lamba, x
corresponds to the index, so you can use that to refer a specific row in a column.在 Lamba 中, x
对应于索引,因此您可以使用它来引用列中的特定行。
df['Subcluster'] = df.apply(lambda x: x['Cluster'] if x['Subcluster'] == 1 else x['Subcluster'], axis = 1)
And the output:和输出:
Cluster Subcluster
0 3 3
1 3 2
2 3 3
3 3 4
4 4 4
5 4 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.