简体   繁体   English

我如何检查几乎所有行是否相同,如果是,则验证最后不同的行是否有一些常数 k

[英]how can i check if almost all rows are identical and if so then verify that the final differing row varies by some constant k

using iris data set with dimension (150,4) I want to see if the rows are identical in columns 1 2 and 4 and if so then verify that the 3rd column has values differing by some constant k.使用尺寸为 (150,4) 的 iris 数据集我想查看第 1 2 和第 4 列中的行是否相同,如果是,则验证第 3 列的值是否相差某个常数 k。 This has to be done for every possible row combination.这必须针对每个可能的行组合进行。

#### load data###
import pandas as pd
df=pd.DataFrame(iris.data,columns=iris.feature_names)
df.head()

this gives me an error "The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."这给了我一个错误“系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。” and also does not account for all the rows并且也不考虑所有行

I chose k=4 here我这里选择了k=4

list=range(0,148)

for row in list:
    if df.iloc[row,:]==df.iloc[row+1,:]:
        df.iloc[row,2]-df.iloc[row+1,2]<=4
    else:
        print('nothing')

The error message basically tells you how to fix this:错误消息基本上告诉您如何解决此问题:

if df.iloc[row,:]==df.iloc[row+1,:]:

it needs to be:它必须是:

if (df.iloc[row,:]==df.iloc[row+1,:]).all():

First you need to filter the columns you want to compare.首先,您需要过滤要比较的列。 In this case columns 0, 1 and 3 by doing the following comparison df.iloc[row1,[0, 1, 3]] == df.iloc[row2, [0, 1, 3]] .在这种情况下,第 0、1 和 3 列通过执行以下比较df.iloc[row1,[0, 1, 3]] == df.iloc[row2, [0, 1, 3]] This returns an array of True or False values.这将返回 True 或 False 值的数组。 But you need all columns to be the same.但是您需要所有列都相同。 To acomplish that you need the .all() method.要实现这一点,您需要.all()方法。 It returns true only if all values in the array are True.只有当数组中的所有值都为 True 时,它才会返回 true。 In summary:总之:

if (df.iloc[row1,identical_columns] == df.iloc[row2, identical_columns]).all():

And since you need to iterate over every posible row combination a double for loop will do nicely.而且由于您需要遍历每个可能的行组合,因此双 for 循环会做得很好。

for row1 in range(m-1):
    for row2 in range(row1+1, m):
        # Check for every row combinaton if the columns are equal
        if (df.iloc[row1,identical_columns] == df.iloc[row2, identical_columns]).all():
            pass

Altogether:共:

import numpy as np
import pandas as pd

df = pd.DataFrame(data=np.random.randint(0, 100, (10, 4)))
m = df.shape[0]

identical_columns = [0, 1, 3]
k = 4

# Force rows values to pass
df.iloc[2, :] = [3, 4, 5, 1]
df.iloc[3, :] = [3, 4, 4, 1]

for row1 in range(m-1):
    for row2 in range(row1+1, m):
        # Check for every row combinaton if the columns are equal
        if (df.iloc[row1,identical_columns] == df.iloc[row2, identical_columns]).all():


            if df.iloc[row1,2] - df.iloc[row2,2] <= k:
                # TODO: Implement Your logic
                print ('We pass!')
        else:
            print(f"row {row1} and row {row2} don't pass")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何检查numpy中的二进制映像是否几乎全是黑色? - How can i check in numpy if a binary image is almost all black? 当我需要所有行时打印最后一行 - Printing the final row when I need all rows 如果没有逐行迭代数据帧,这需要很长时间,我如何检查许多行是否都满足条件? - Without iterating row by row through a dataframe, which takes ages, how can I check that a number of rows all meet a condition? 我可以在python中将这两个几乎相同的函数合二为一吗? - Can I turn these two almost identical functions into one in python? 如何检查嵌套列表树的所有元素是否相同? - How do I check if all the elements of a nested list tree are identical? 我想检查矩阵中是否有任何相同的行 - I want to check if there are any identical rows in a matrix 如果它们相同,我如何比较多列的行 - How can I compare rows for multiple columns if they are identical 如何使用熊猫仅查找具有不同列值的行? - How can I use pandas to find only rows with differing column values? 如何更新列变化的行? - How do i update a row where the column varies? 我如何组合两个 numpy arrays 所以对于第一个数组的每一行我 append 来自第二个数组的所有行? - How do I combine two numpy arrays so for each row of the first array I append all rows from the second one?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM