[英]how can i check if almost all rows are identical and if so then verify that the final differing row varies by some constant k
using iris data set with dimension (150,4) I want to see if the rows are identical in columns 1 2 and 4 and if so then verify that the 3rd column has values differing by some constant k.使用尺寸为 (150,4) 的 iris 数据集我想查看第 1 2 和第 4 列中的行是否相同,如果是,则验证第 3 列的值是否相差某个常数 k。 This has to be done for every possible row combination.
这必须针对每个可能的行组合进行。
#### load data###
import pandas as pd
df=pd.DataFrame(iris.data,columns=iris.feature_names)
df.head()
this gives me an error "The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."这给了我一个错误“系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。” and also does not account for all the rows
并且也不考虑所有行
I chose k=4 here我这里选择了k=4
list=range(0,148)
for row in list:
if df.iloc[row,:]==df.iloc[row+1,:]:
df.iloc[row,2]-df.iloc[row+1,2]<=4
else:
print('nothing')
The error message basically tells you how to fix this:错误消息基本上告诉您如何解决此问题:
if df.iloc[row,:]==df.iloc[row+1,:]:
it needs to be:它必须是:
if (df.iloc[row,:]==df.iloc[row+1,:]).all():
First you need to filter the columns you want to compare.首先,您需要过滤要比较的列。 In this case columns 0, 1 and 3 by doing the following comparison
df.iloc[row1,[0, 1, 3]] == df.iloc[row2, [0, 1, 3]]
.在这种情况下,第 0、1 和 3 列通过执行以下比较
df.iloc[row1,[0, 1, 3]] == df.iloc[row2, [0, 1, 3]]
。 This returns an array of True or False values.这将返回 True 或 False 值的数组。 But you need all columns to be the same.
但是您需要所有列都相同。 To acomplish that you need the
.all()
method.要实现这一点,您需要
.all()
方法。 It returns true only if all values in the array are True.只有当数组中的所有值都为 True 时,它才会返回 true。 In summary:
总之:
if (df.iloc[row1,identical_columns] == df.iloc[row2, identical_columns]).all():
And since you need to iterate over every posible row combination a double for loop will do nicely.而且由于您需要遍历每个可能的行组合,因此双 for 循环会做得很好。
for row1 in range(m-1):
for row2 in range(row1+1, m):
# Check for every row combinaton if the columns are equal
if (df.iloc[row1,identical_columns] == df.iloc[row2, identical_columns]).all():
pass
Altogether:共:
import numpy as np
import pandas as pd
df = pd.DataFrame(data=np.random.randint(0, 100, (10, 4)))
m = df.shape[0]
identical_columns = [0, 1, 3]
k = 4
# Force rows values to pass
df.iloc[2, :] = [3, 4, 5, 1]
df.iloc[3, :] = [3, 4, 4, 1]
for row1 in range(m-1):
for row2 in range(row1+1, m):
# Check for every row combinaton if the columns are equal
if (df.iloc[row1,identical_columns] == df.iloc[row2, identical_columns]).all():
if df.iloc[row1,2] - df.iloc[row2,2] <= k:
# TODO: Implement Your logic
print ('We pass!')
else:
print(f"row {row1} and row {row2} don't pass")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.