根据每个元组的第一个值比较两个元组列表（但返回所有元组值）

Question

I am trying to compare the output of two different systems by finding devices that are unique in System A, unique in System B and finally devices that exist in both systems. 我试图通过找到系统A中唯一的设备，系统B中唯一的设备以及最后两个系统中都存在的设备来比较两个不同系统的输出。

Right now I have my data coming out of both systems as a list of tuples. 现在，我有来自两个系统的数据作为元组列表。 My example data looks like this: 我的示例数据如下所示：

system_a_devices = [("host1.test.local", "Test 1 Group"), ("host5.testing.lan", "LAN Test Group"), ("server5.hello.local", "Hello Corporation, Inc."), ("desktop1.corp.tld", "Corporate TLD, Ltd.")]

system_b_devices = [("desktop1.corp.tld", "Corporate TLD, Ltd."), ("host1.test.local", "Test One Group"), ("server6.hello.local", "Hello Corporation, Inc.")]

The first value in the tuple is the FQDN of the host and the second value is a descriptive name for the device (in this particular sample it's a customer name). 元组中的第一个值是主机的FQDN，第二个值是设备的描述性名称（在此特定示例中为客户名称）。 While the customer name is needed in the final result, they do NOT necessarily need to match (see "Test One Group" and "Test 1 Group" yet they share the same FQDN). 虽然最终结果中需要客户名称，但他们不一定需要匹配（请参阅“测试一个组”和“测试1组”，但是它们共享相同的FQDN）。 As such, the final result could contain the string "Test 1 Group" OR "Test One Group" as either will work for what I'm trying to accomplish (though System B most likely has the most accurate data for the customer name). 这样，最终结果可能包含字符串“ Test 1 Group”或“ Test One Group”，因为这两种方法均可满足我要完成的工作（尽管系统B最有可能为客户提供最准确的数据）。

The FQDN (first value in the tuple) should be the only thing considered when determining the unique values from each system. 在确定每个系统的唯一值时，应仅考虑FQDN（元组中的第一个值）。 Also, each of the two systems can return the list of systems in any random order and the number of tuples (FQDN/customer name pairings) per list from each system will vary. 同样，两个系统中的每个系统都可以以任何随机顺序返回系统列表，并且每个系统中每个列表的元组（FQDN /客户名称配对）的数量将有所不同。

My end result should look something similar to this: 我的最终结果应类似于以下内容：

system_a_unique = [("host5.testing.lan", "LAN Test Group"), ("server5.hello.local", "Hello Corporation, Inc.")]

system_b_unique = [("server6.hello.local", "Hello Corporation, Inc.")]

both_systems = [("host1.test.local", "Test One Group"), ("desktop1.corp.tld", "Corporate TLD, Ltd.")]

As I mentioned earlier, the description/customer name COULD come from either system for the "both_systems" list but System B probably has better/cleaner data if it's not too much work extra effort to use System B's data. 正如我前面提到的，描述/客户名称可能来自“ both_systems”列表的任何一个系统，但是如果使用系统B的数据不需要花费太多精力，系统B的数据可能会更好/更干净。

How would I efficiently accomplish this task? 我将如何有效地完成这项任务？ Would the better question to ask be how should I structure my data output from System A and System B to better accomplish this (ie list of tuples is a bad idea)? 更好的问题是，应该如何构造系统A和系统B的数据输出以更好地完成此操作（即，元组列表是一个坏主意）？

Answer 1

Would the better question to ask be how should I structure my data output from System A and System B to better accomplish this (ie list of tuples is a bad idea)? 更好的问题是，应该如何构造系统A和系统B的数据输出以更好地完成此操作（即，元组列表是一个坏主意）？

I have to say that, yes, a simple move to dict s would make this trivial. 我必须说，是的，只需简单地执行dict动作就可以做到这一点。

system_a_devices = {"host1.test.local": "Test 1 Group", "host5.testing.lan": "LAN Test Group", "server5.hello.local": "Hello Corporation, Inc.", "desktop1.corp.tld": "Corporate TLD, Ltd."}
system_b_devices = {"desktop1.corp.tld": "Corporate TLD, Ltd.", "host1.test.local": "Test One Group", "server6.hello.local": "Hello Corporation, Inc."}

Now you can just do straightforward list comps: 现在，您可以直接进行列表压缩：

system_a_unique = [tup for tup in system_a_devices.items() if tup[0] not in system_b_devices]
system_b_unique = [tup for tup in system_b_devices.items() if tup[0] not in system_a_devices]
both_systems = [tup for tup in system_b_devices.items() if tup[0] in system_a_devices]

Answer 2

You can use set operations on the FQDNs to find which are unique to each system and which are on both, and then use dicts to lookup the device names based on FQDNs: 您可以在FQDN上使用set操作来查找每个系统唯一的和在两个系统上都唯一的操作，然后使用dict基于FQDN查找设备名称：

# create FQDN -> device name dicts for each system
devices_a = dict(system_a_devices)
devices_b = dict(system_b_devices)

# create a set of FQDNs for each system
fqdn_set_a = set(system_a_devices.keys())
fqdn_set_b = set(system_b_devices.keys())

# compute FQDNs unique to each systems and those which are not unique
unique_fqdns_a = fqdn_set_a - fqdn_set_b
unique_fqdns_b = fqdn_set_b - fqdn_set_a
non_unique_fqdns = fqdn_set_a & fqdn_set_b

# now add device names using the FQDN -> device name dicts
system_a_unique = [(fqdn, devices_a[fqdn]) for fqdn in unique_fqdns_a]
system_b_unique = [(fqdn, devices_b[fqdn]) for fqdn in unique_fqdns_b]
# note: for FQDNs found on both systems, use the device name from system B
both_systems = [(fqdn, devices_b[fqdn]) for fqdn in non_unique_fqdns]

根据每个元组的第一个值比较两个元组列表（但返回所有元组值）

问题描述

2 个解决方案

解决方案1
1 已采纳 2013-11-08 00:37:21

解决方案2
0 2013-11-07 23:52:24

根据每个元组的第一个值比较两个元组列表（但返回所有元组值）

问题描述

2 个解决方案

解决方案1 1 已采纳 2013-11-08 00:37:21

解决方案2 0 2013-11-07 23:52:24

解决方案1
1 已采纳 2013-11-08 00:37:21

解决方案2
0 2013-11-07 23:52:24