[英]Python Pandas: How to compare values of cells and two columns and maybe using If...Else statement to create another column with new values
I'm trying and researching a lot how to do this, but I'm having trouble mixing pandas with if, else and/or get values by index and compare it with if, else and assign a column with codes/values.我正在尝试和研究很多如何做到这一点,但我无法将 pandas 与 if、else 和/或按索引获取值并将其与 if、else 进行比较并分配包含代码/值的列。 Explanation: I have this table below, I want to compare the cells in the ID column, and if the value of the posterior cell is equal to the previous one AND if in the COD column the posterior cell is equal to the previous one, THEN Result column = "no", otherwise "pass" and if neither then "unknown"
说明:我有下面这张表,我想比较ID列中的单元格,如果后单元格的值等于前一个单元格,并且如果在COD列中后单元格等于前一个单元格,则结果列 =“否”,否则为“通过” ,如果两者都不是,则为“未知”
This is the formula in excel that I made: =IF(B3=B2,IF(C3=C2,"NO","PASS"),"UNKNOWN").这是我在 excel 中做的公式:=IF(B3=B2,IF(C3=C2,"NO","PASS"),"UNKNOWN")。
Below I have also posted some code attempts.下面我也贴出了一些代码尝试。 I can even create two columns with the first test (from the ID column cells) and the second test (from the COD column cells), and return with Boolean results, but I can't get the If, Else to join it all together and generate the values I want in another column.
我什至可以用第一个测试(来自 ID 列单元格)和第二个测试(来自 COD 列单元格)创建两列,并返回 Boolean 结果,但我无法获得 If, Else 将它们连接在一起并在另一列中生成我想要的值。 Would I appreciate it if someone could help me?
如果有人可以帮助我,我会很感激吗?
df = df.sort_values(by=['ID'])
df['matchesID'] = df['ID'].shift(1) == df['ID']
df['matchesCod']= df['Cod'].shift(1) == df['Cod']
or或者
df = df.sort_values(by=['ID'])
test = (df['SWENo'].shift(1) == df['SWENo']) & (df['Cod'].shift(1) == df['Cod'])
I was trying something like this below我在下面尝试这样的事情
if df['ID'].shift(1) == df['ID'] and df['Cod'].shift(1) == df['Cod']:
listProg.append('not')
elif df['ID'].shift(1) == df['ID'] and df['Cod'].shift(1) != df['Cod']:
listProg.append('pass')
else:
listProg.append('Unknown')
But the result is: "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()".但结果是:“ValueError:系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()”。
If you can help me I appreciate it, it can be with pandas or not or mixing.如果你能帮助我,我很感激,它可以与 pandas 或不混合。 I just need it to work.
我只需要它工作。 Thank you guys.
感谢你们。
Similar approach in pandas will be to use numpy.where
function. pandas 中的类似方法将使用
numpy.where
function。
With this code:使用此代码:
import numpy as np
df['Result'] = np.where(df['ID'] == df['ID'].shift(), np.where(df['Cod'] == df['Cod'].shift(), 'NO', 'PASS'), 'UNKNOWN')
I get below results:我得到以下结果:
ID Cod Result
0 1 1 UNKNOWN
1 2 1 UNKNOWN
2 2 1 NO
3 3 1 UNKNOWN
4 4 1 UNKNOWN
5 4 2 PASS
6 4 2 NO
7 5 1 UNKNOWN
8 6 1 UNKNOWN
which seems more inline with your description of how Result value is derived.这似乎更符合您对如何得出结果值的描述。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.