我如何检查几乎所有行是否相同，如果是，则验证最后不同的行是否有一些常数 k

Question

using iris data set with dimension (150,4) I want to see if the rows are identical in columns 1 2 and 4 and if so then verify that the 3rd column has values differing by some constant k.使用尺寸为 (150,4) 的 iris 数据集我想查看第 1 2 和第 4 列中的行是否相同，如果是，则验证第 3 列的值是否相差某个常数 k。 This has to be done for every possible row combination.这必须针对每个可能的行组合进行。

#### load data###
import pandas as pd
df=pd.DataFrame(iris.data,columns=iris.feature_names)
df.head()

this gives me an error "The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."这给了我一个错误“系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。” and also does not account for all the rows并且也不考虑所有行

I chose k=4 here我这里选择了k=4

list=range(0,148)

for row in list:
    if df.iloc[row,:]==df.iloc[row+1,:]:
        df.iloc[row,2]-df.iloc[row+1,2]<=4
    else:
        print('nothing')

Answer 1

The error message basically tells you how to fix this:错误消息基本上告诉您如何解决此问题：

if df.iloc[row,:]==df.iloc[row+1,:]:

it needs to be:它必须是：

if (df.iloc[row,:]==df.iloc[row+1,:]).all():

Answer 2

First you need to filter the columns you want to compare.首先，您需要过滤要比较的列。 In this case columns 0, 1 and 3 by doing the following comparison df.iloc[row1,[0, 1, 3]] == df.iloc[row2, [0, 1, 3]] .在这种情况下，第 0、1 和 3 列通过执行以下比较df.iloc[row1,[0, 1, 3]] == df.iloc[row2, [0, 1, 3]] 。 This returns an array of True or False values.这将返回 True 或 False 值的数组。 But you need all columns to be the same.但是您需要所有列都相同。 To acomplish that you need the .all() method.要实现这一点，您需要.all()方法。 It returns true only if all values in the array are True.只有当数组中的所有值都为 True 时，它才会返回 true。 In summary:总之：

if (df.iloc[row1,identical_columns] == df.iloc[row2, identical_columns]).all():

And since you need to iterate over every posible row combination a double for loop will do nicely.而且由于您需要遍历每个可能的行组合，因此双 for 循环会做得很好。

for row1 in range(m-1):
    for row2 in range(row1+1, m):
        # Check for every row combinaton if the columns are equal
        if (df.iloc[row1,identical_columns] == df.iloc[row2, identical_columns]).all():
            pass

Altogether:共：

import numpy as np
import pandas as pd

df = pd.DataFrame(data=np.random.randint(0, 100, (10, 4)))
m = df.shape[0]

identical_columns = [0, 1, 3]
k = 4

# Force rows values to pass
df.iloc[2, :] = [3, 4, 5, 1]
df.iloc[3, :] = [3, 4, 4, 1]

for row1 in range(m-1):
    for row2 in range(row1+1, m):
        # Check for every row combinaton if the columns are equal
        if (df.iloc[row1,identical_columns] == df.iloc[row2, identical_columns]).all():


            if df.iloc[row1,2] - df.iloc[row2,2] <= k:
                # TODO: Implement Your logic
                print ('We pass!')
        else:
            print(f"row {row1} and row {row2} don't pass")

我如何检查几乎所有行是否相同，如果是，则验证最后不同的行是否有一些常数 k

问题描述

2 个解决方案

解决方案1
0 2019-10-05 02:16:34

解决方案2
0 2019-10-05 03:38:50

我如何检查几乎所有行是否相同，如果是，则验证最后不同的行是否有一些常数 k

问题描述

2 个解决方案

解决方案1 0 2019-10-05 02:16:34

解决方案2 0 2019-10-05 03:38:50

解决方案1
0 2019-10-05 02:16:34

解决方案2
0 2019-10-05 03:38:50