简体   繁体   English

根据每个元组的第一个值比较两个元组列表(但返回所有元组值)

[英]Comparing Two Lists of Tuples Based On Each Tuple's First Value (But Returning ALL Tuple Values)

I am trying to compare the output of two different systems by finding devices that are unique in System A, unique in System B and finally devices that exist in both systems. 我试图通过找到系统A中唯一的设备,系统B中唯一的设备以及最后两个系统中都存在的设备来比较两个不同系统的输出。

Right now I have my data coming out of both systems as a list of tuples. 现在,我有来自两个系统的数据作为元组列表。 My example data looks like this: 我的示例数据如下所示:

system_a_devices = [("host1.test.local", "Test 1 Group"), ("host5.testing.lan", "LAN Test Group"), ("server5.hello.local", "Hello Corporation, Inc."), ("desktop1.corp.tld", "Corporate TLD, Ltd.")]

system_b_devices = [("desktop1.corp.tld", "Corporate TLD, Ltd."), ("host1.test.local", "Test One Group"), ("server6.hello.local", "Hello Corporation, Inc.")]

The first value in the tuple is the FQDN of the host and the second value is a descriptive name for the device (in this particular sample it's a customer name). 元组中的第一个值是主机的FQDN,第二个值是设备的描述性名称(在此特定示例中为客户名称)。 While the customer name is needed in the final result, they do NOT necessarily need to match (see "Test One Group" and "Test 1 Group" yet they share the same FQDN). 虽然最终结果中需要客户名称,但他们不一定需要匹配(请参阅“测试一个组”和“测试1组”,但是它们共享相同的FQDN)。 As such, the final result could contain the string "Test 1 Group" OR "Test One Group" as either will work for what I'm trying to accomplish (though System B most likely has the most accurate data for the customer name). 这样,最终结果可能包含字符串“ Test 1 Group”或“ Test One Group”,因为这两种方法均可满足我要完成的工作(尽管系统B最有可能为客户提供最准确的数据)。

The FQDN (first value in the tuple) should be the only thing considered when determining the unique values from each system. 在确定每个系统的唯一值时,应仅考虑FQDN(元组中的第一个值)。 Also, each of the two systems can return the list of systems in any random order and the number of tuples (FQDN/customer name pairings) per list from each system will vary. 同样,两个系统中的每个系统都可以以任何随机顺序返回系统列表,并且每个系统中每个列表的元组(FQDN /客户名称配对)的数量将有所不同。

My end result should look something similar to this: 我的最终结果应类似于以下内容:

system_a_unique = [("host5.testing.lan", "LAN Test Group"), ("server5.hello.local", "Hello Corporation, Inc.")]

system_b_unique = [("server6.hello.local", "Hello Corporation, Inc.")]

both_systems = [("host1.test.local", "Test One Group"), ("desktop1.corp.tld", "Corporate TLD, Ltd.")]

As I mentioned earlier, the description/customer name COULD come from either system for the "both_systems" list but System B probably has better/cleaner data if it's not too much work extra effort to use System B's data. 正如我前面提到的,描述/客户名称可能来自“ both_systems”列表的任何一个系统,但是如果使用系统B的数据不需要花费太多精力,系统B的数据可能会更好/更干净。

How would I efficiently accomplish this task? 我将如何有效地完成这项任务? Would the better question to ask be how should I structure my data output from System A and System B to better accomplish this (ie list of tuples is a bad idea)? 更好的问题是,应该如何构造系统A和系统B的数据输出以更好地完成此操作(即,元组列表是一个坏主意)?

Would the better question to ask be how should I structure my data output from System A and System B to better accomplish this (ie list of tuples is a bad idea)? 更好的问题是,应该如何构造系统A和系统B的数据输出以更好地完成此操作(即,元组列表是一个坏主意)?

I have to say that, yes, a simple move to dict s would make this trivial. 我必须说,是的,只需简单地执行dict动作就可以做到这一点。

system_a_devices = {"host1.test.local": "Test 1 Group", "host5.testing.lan": "LAN Test Group", "server5.hello.local": "Hello Corporation, Inc.", "desktop1.corp.tld": "Corporate TLD, Ltd."}
system_b_devices = {"desktop1.corp.tld": "Corporate TLD, Ltd.", "host1.test.local": "Test One Group", "server6.hello.local": "Hello Corporation, Inc."}

Now you can just do straightforward list comps: 现在,您可以直接进行列表压缩:

system_a_unique = [tup for tup in system_a_devices.items() if tup[0] not in system_b_devices]
system_b_unique = [tup for tup in system_b_devices.items() if tup[0] not in system_a_devices]
both_systems = [tup for tup in system_b_devices.items() if tup[0] in system_a_devices]

You can use set operations on the FQDNs to find which are unique to each system and which are on both, and then use dicts to lookup the device names based on FQDNs: 您可以在FQDN上使用set操作来查找每个系统唯一的和在两个系统上都唯一的操作,然后使用dict基于FQDN查找设备名称:

# create FQDN -> device name dicts for each system
devices_a = dict(system_a_devices)
devices_b = dict(system_b_devices)

# create a set of FQDNs for each system
fqdn_set_a = set(system_a_devices.keys())
fqdn_set_b = set(system_b_devices.keys())

# compute FQDNs unique to each systems and those which are not unique
unique_fqdns_a = fqdn_set_a - fqdn_set_b
unique_fqdns_b = fqdn_set_b - fqdn_set_a
non_unique_fqdns = fqdn_set_a & fqdn_set_b

# now add device names using the FQDN -> device name dicts
system_a_unique = [(fqdn, devices_a[fqdn]) for fqdn in unique_fqdns_a]
system_b_unique = [(fqdn, devices_b[fqdn]) for fqdn in unique_fqdns_b]
# note: for FQDNs found on both systems, use the device name from system B
both_systems = [(fqdn, devices_b[fqdn]) for fqdn in non_unique_fqdns]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何用元组中每个元组的第一个值构成一个新元组? - How to form a new tuple with the first value of each tuple of a tuple of tuples? Python:zip()返回列表内元组中的两个列表,而不是每个元素的元组列表 - Python: zip() returning two lists inside a tuple inside a list instead of a list of tuples of each element 如何在两个不同的元组列表中与元组的第一个元素相交 - How to Intersect on first element of tuple in two different lists of tuples 仅使用元组的前两个部分将3元组与3元组列表进行比较 - Comparing a 3-tuple to a list of 3-tuples using only the first two parts of the tuple 如何在 Python 中的元组列表中对每个元组中的前两个值求和? - How do I sum the first two values in each tuple in a list of tuples in Python? 如何根据每个元组中第一个数字的值从元组列表中弹出项目? - How do I pop item off a list of tuples, based on the value of the first number in each tuple? 基于第二个值的元组列表中的第一个元组值 - First tuple value in tuples list based on second value Python将两个列表合并为两个元组的元组 - Python merge two lists into tuple of two tuples Python:根据元组的第一个值从集合中检索元组 - Python: Retrieve Tuples From Set Based on First Value of Tuple 基于每个元组中的值的元组分区列表 - Partition list of tuples based on a value within each tuple
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM