简体   繁体   English

通过索引和名称匹配两个列表

[英]Match two lists by index and name

How can I compare two lists together, and create an output list where common items are shifted to match in index and name. 如何将两个列表进行比较,并创建一个输出列表,将常见项目转移到索引和名称匹配的位置。 The main list is made once and stays the same throughout the script. 主列表创建一次,并且在整个脚本中保持不变。

There can be situations where the changing list will have items that do not exist in the main list, I'd like to create a separate list for these items... 在某些情况下,更改列表将包含主列表中不存在的项目,我想为这些项目创建单独的列表...

Example: 例:

main_list = ['apple', 'orange', 'banana', 'pear', 'mango', 'peach', 'strawberry']
changing_list = ['apple', 'banana', 'cucumber', 'peach', 'pear', 'fish']

output = ['apple', 'NA', 'banana', 'pear', 'NA', 'peach', 'NA']
added_output = ['cucumber', 'fish']

Using the sorted() function on each list before comparison may be of some use, however, I can't get my head around indicating that 'orange', for example is missing (preferably by using NA or X). 在比较之前在每个列表上使用sorted()函数可能会有些用处,但是,我无法直截了当地指出“橙色”,例如缺少(最好使用NA或X)。 I am aware of the option of using, sets and the '&' operator, however, using this does not indicate which item was missing with an index/positioning perspective (the NA part) 我知道使用,集和'&'运算符的选项,但是,使用此选项并不能从索引/位置角度(NA部分)指示缺少哪个项目。

You can do this with sets and list comprehensions: 您可以使用集合和列表理解来做到这一点:

def ordered_intersection(main_list, changing_list):
    changing_set = set(changing_list)
    output = [x if x in changing_set else 'NA' for x in main_list]

    output_set = set(output)
    added_output = [x for x in changing_list if x not in output_set]

    return output, added_output

Which works as follows: 其工作原理如下:

>>> main_list = ['apple', 'orange', 'banana', 'pear', 'mango', 'peach', 'strawberry']
>>> changing_list = ['apple', 'banana', 'cucumber', 'peach', 'pear', 'fish']
>>> ordered_intersection(main_list, changing_list)
(['apple', 'NA', 'banana', 'pear', 'NA', 'peach', 'NA'], ['cucumber', 'fish'])

Explanation of above code: 以上代码说明:

  • First convert changing_list to a set, since set membership is constant time, as opposed to list membership which is linear time. 首先将changing_list转换为集合,因为集合成员资格是固定时间,而列表成员资格是线性时间。
  • Since we want to maintain the order of main_list into output, we have to traverse all the elements in that list, and check if they exist in changing_set . 既然我们要保持秩序main_list到输出,我们必须遍历该列表中的所有元素,并检查他们是否在存在changing_set This prevents quadratic time complexity for each operation, and allows linear behavior instead. 这样可以避免每个操作的二次时间复杂度,并允许线性行为。
  • The above logic is also applied to added_output . 上面的逻辑也适用于added_output

Assuming that you don't care about duplicates, you can use sets to do this to find the differences efficiently: 假设您不关心重复项,则可以使用集合来执行此操作以有效地找到差异:

output=[]
main_set, changing_set = set(main_list), set(changing_list)
for i in main_list:
    output.append(i if i not in changing_set else "NA")
added_output = changing_set - main_set

The following approach works to match two lists by index and name 以下方法通过索引和名称匹配两个列表

>>> main_list = ['apple', 'orange', 'banana', 'pear','mango', 'peach', 
'strawberry']
>>> changing_list = ['apple', 'banana', 'cucumber', 'peach', 'pear', 'fish']
>>> output = []
>>> for word in main_list:
...     if word in changing_list:
...             output.append(word)
...     else:
...             output.append('NA')
...
>>> output
['apple', 'NA', 'banana', 'pear', 'NA', 'peach', 'NA']

>>> added_output = []
>>> for word in changing_list:
...     if word not in main_list:
...             added_output.append(word)
...
>>> added_output
['cucumber', 'fish']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM