简体   繁体   English

检查嵌套分区中的行具有相同的值

[英]Check rows within a nested partition have the same values

I have a table with two IDs, and I need to check that for a particular ID1 and ID2, all the products are the same and the same number of products. 我有一个带有两个ID的表,我需要检查特定的ID1和ID2,所有产品是否相同且产品数量相同。

For example in the table below, I have 10001 which has 123 and 234, and there's a line missing which is 123 having Product 2, and 例如,在下表中,我有10001,其中有123和234,并且缺少一行,其中有产品2和123。

for 20002, 345 and 456 both have Product 3 and 4 but, there's a difference in the last product. 对于20002,345和456都有产品3和4,但是最后一个产品有所不同。 I need to find such cases in my data. 我需要在数据中找到这种情况。

ID1     ID2     Product
10001   123     Product 1
10001   234     Product 1
10001   234     Product 2
20002   345     Product 3
20002   345     Product 4
20002   345     Product 5
20002   456     Product 3
20002   456     Product 4
20002   456     Product 6

The perfect scenario will be, which will be correct. 完美的场景将会是正确的。

ID1     ID2     Product
10001   123     Product 1
10001   123     Product 2
10001   234     Product 1
10001   234     Product 2
20002   345     Product 3
20002   345     Product 4
20002   345     Product 5
20002   456     Product 3
20002   456     Product 4
20002   456     Product 5

Basically I need to find all the cases in my data where in a particular ID1, all the ID2's don't have consistent products, by consistent products I mean all ID2s should have the same products within an ID1. 基本上,我需要在我的数据中查找所有情况,其中在特定的ID1中,所有ID2都没有一致的乘积,通过一致的乘积,我的意思是所有ID2在ID1中都应具有相同的乘积。

Any suggestions on a way to find the cases in the first table? 在第一个表中找到病例的方法有什么建议吗? Thanks! 谢谢!

Imagine you've loaded your data into a dict, and product list is a set (this would help you guarantee that products aren't duplicated for an id1, id2, by the way): 想象一下,您已经将数据加载到dict中,并且产品列表是一个集合(顺便说一句,这将帮助您确保产品不会重复复制为id1,id2):

data = {
    10001: {
        123: set([1]),
        234: set([1,2])
    },
    20002: {
        345: set([3,4,6]),
        456: set([3,4,6])
    }
}

Then you can check if two values for id2 have the same items by using the '^' operator on sets. 然后,您可以通过在集合上使用“ ^”运算符来检查id2的两个值是否具有相同的项目。 Check https://docs.python.org/3/library/stdtypes.html#set . 检查https://docs.python.org/3/library/stdtypes.html#set For example: 例如:

a = data[10001][123]
b = data[10001][234]
c = a ^ b # len(c) will be >0 !!

'^' calculatesthe symmetric difference between both sets, so it will return the empty set if and only if both sets are equal. '^'计算两个集合之间的对称差,因此仅当两个集合相等时,它将返回空集合。

So you can iterate over all id2 keys for a given id1 and break with a message once '^' of it and the previous one hasn't got zero len. 因此,您可以遍历给定id1的所有id2密钥,并在出现“ ^”且上一个密钥没有达到len时中断一条消息。 Example: 例:

for id1 in data:
    last_seen = None
    for id2 in data[id1]:
        actual = data[id1][id2]
        if last_seen != None and len(last_seen ^ actual) != 0:
                print('Items for id1 {} are not equal'.format(id1))
                break
        last_seen = actual

This is supposing your csv file isn't necessarly ordered so you needed to load it into a dict... If your file is ordered by ids then you can read the file and do the job at once, of course, i'm sure you can adapt this. 这是假设您的csv文件没有必要进行排序,因此您需要将其加载到字典中...如果您的文件是按id进行排序的,那么您可以立即读取该文件并完成工作,当然,我敢肯定你可以适应这个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何检查两个 pandas 数据帧是否具有相同的值并将这些行连接起来? - How to check if two pandas dataframes have same values and concatenate those rows? 如果行在特定列中具有相同的值,则对特定行的值求和 - Sum the values of specific rows if the rows have same values in specific column 当行以不同顺序具有相同值时删除行 - Drop rows when rows have same values in different order pandas 如何检查以下两行是否具有相同的列值 - pandas how to check if the following 2 rows have the same column value 如何检查 pandas 列中接下来的 3 个连续行是否具有相同的值? - How to check if next 3 consecutive rows in pandas column have same value? Pandas dataframe - 检查多行是否具有相同的值 - Pandas dataframe - check if multiple rows have the same value 在pandas df中对列具有范围内的值的行进行分组 - Group rows where columns have values within range in pandas df Dataframe 到字典,在嵌套字典中具有所有相同的列值 - Dataframe to Dictionary, with all same column values within a nested dictionary 如何检查列表列表中的元素是否具有重叠的数值 - How to check if the elements within a list of lists have overlapping numerical values 总结嵌套 python 字典中的值,如果它们具有相同的键 - Sum up the values inside nested python dictionaries if they have the same key
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM