[英]How to check if 3 columns are same and add a new column with the value if the values are same?
I have a dataframe that look like this..我有一个看起来像这样的数据框..
index Column A Column B Column C
0 alice alice alice
1 nick nick john
2 juli nick alice
I want to check in Column A, Column B and Column C
are equal or not.我想检查Column A, Column B and Column C
是否相等。 If equal, I want to add the the value as a new Column D
.如果相等,我想将该值添加为新的Column D
。 If not, add None
to Column D
如果不是,则将None
添加到Column D
I did this so far..到目前为止我这样做了..
def func(row):
if ((row['Column A']) == (row['Column B']) == (row['Column C'])):
df['Column D'] = df['Column A']
else:
df['Column D'] = None
When I applied the function using.. df.apply (lambda row: func(row),axis =1)
, I am not getting the desired output.当我使用 .. df.apply (lambda row: func(row),axis =1)
应用该函数时,我没有得到所需的输出。
I got something like this..我有这样的东西..
index Column A Column B Column C Column D
0 alice alice alice None
1 nick nick john None
2 juli nick alice None
whereas, I want the output to be like..而,我希望输出像..
index Column A Column B Column C Column D
0 alice alice alice alice
1 nick nick john None
2 juli nick alice None
Any help on this?这有什么帮助吗?
use numpy where使用numpy where
here you are taking a subset of the dataframe to compare and store to an array arr
then comparing the first column of the array against the rest of the columns.在这里,您将数据帧的一个子集进行比较并存储到数组arr
然后将数组的第一列与其余列进行比较。
import numpy as np
arr = df[['A','B','C']].values
df['D'] = np.where((arr == arr[:, [0]]).all(axis=1),df['A'],None)
or或者
def func(row):
if ((row['A']) == (row['B']) == (row['C'])):
return row['A']
else:
return None
df['D'] = df.apply(lambda row: func(row),axis =1)
In your if clause you wrote:在您的 if 子句中,您写道:
(row['Column A']) == (row['Column B']) == (row['Column C'])
I'm not sure if it is the right way to do it.我不确定这是否是正确的方法。 Have you tried this code below as your if clause?你有没有试过下面的这段代码作为你的 if 子句?
((row['Column A']) == (row['Column B'])) and ((row['Column B']) == (row['Column C']))
I tried df['Column D'] = np.where((((df['Column A'])==(df['Column B']))& (df['Column B'] == df['Column C'])),df['Column A'],None)
我试过df['Column D'] = np.where((((df['Column A'])==(df['Column B']))& (df['Column B'] == df['Column C'])),df['Column A'],None)
and this worked!这有效! Thanks all for giving the idea.感谢大家提供的想法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.