简体   繁体   English

Python-匹配/比较两个数据/元组列表

[英]Python - match/compare two data/tuple lists

I have two sets of data, one is from a queryset in django, which will return data as below: 我有两组数据,一组来自django中的queryset,它将返回如下数据:

sr_data =  ShowroomConfigData.objects.only('location').filter(is_showroom=True).exclude(location='MajorSite')
for i in sr_data:
    print i.location

London
Glasgow
Edinbrugh
...

The second set of data is from an external Mysql query that returns a list of tuples: 第二组数据来自外部Mysql查询,该查询返回元组列表:

query = """long mysql query..."""
cur.execute(query)
esr_data = cur.fetchall()
for i in esr_data:
    print i[3]

London
Glasgow
Edinburgh
...

esr_data sample: esr_data示例:

('John Smith', '0123456789', 'billy', 'London', 'London', datetime.date(2014, 12, 19), '0123456789', 'Bobs Builders', '123 place', 'city', 'add', 'London', 'LDN 103', ', '', '', ' ', '', '', ' ', ' ', ' ', ' ', '', ' ', '')

They're not necessarily in that order either, the orders of both are random I think. 它们也不一定按该顺序排列,我认为两者的顺序都是随机的。

But the external query has the some details I want to import into django on a regular basis. 但是外部查询有一些我想定期导入django的细节。

So I need to loop both lists and import data into django when they match, the only problem is, looping they will likely never match. 所以我需要循环两个列表,并在它们匹配时将数据导入django中,唯一的问题是,循环它们可能永远都不匹配。

Does anyone know of a way I can make this work? 有人知道我可以做这项工作的方法吗?

Thanks 谢谢

So, as I understand it, you have 2 variables which are both iterables. 因此,据我了解,您有两个都是可迭代的变量。 They contain some value and you want to find the items in one when it matches the items in the other. 它们包含一些值,并且您想在其中一个与另一个匹配时找到它们。

So, a naive way of doing it is: 因此,一种简单的方法是:

for i in esr_data:
    for j in sr_data:
        if i[3] == j.location:
            # Import into django as they match

But this is not very nice as it is O(M * N) where M is num of esr_data and N is num of sr_data 但这不是很好,因为它是O(M * N) ,其中M是esr_data的数量,N是sr_data的数量

Are you M and N very large ? 您的M和N是否很大? If not, this could work. 如果没有,这可能有效。

To reduce complexity for large data, first find the common locations in both: 为了降低大数据的复杂性,请首先在以下两者中找到共同的位置:

common_locations = set(d.location for d in sr_data) & set(d[3] for d in esr_data)
new_sr_data = filter(lambda d: d.location in common_locations,
                     sr_data) 
new_esr_data = filter(lambda d: d[3] in common_locations,
                      esr_data) 
for i in new_esr_data:
    for j in new_sr_data:
        if i[3] == j.location:
            # Import into django as they match

Which reduces complexity to O(L * L) where L is the number of common elements. 这将复杂度降低到O(L * L) ,其中L是公共元素的数量。

It looks like esr_data returns a tuple of elements, not a list of tuples. 看起来esr_data返回一个元素元组,而不是元组列表。

So you just need to compare lists for similar elements. 因此,您只需要比较类似元素的列表。 You can sets for this: 您可以为此设置:

result = list(set(sr_data).intersection(esr_data))

Having read through the other answers, I'm beginning to question my own assumptions, and think the actual query is more: 阅读完其他答案后,我开始质疑自己的假设,并认为实际查询更多:

Two lists of tuples, find the rows which are in both, based on specific fields (eg: fields 1, 3 and 5 match in both) 元组的两个列表,根据特定字段查找两个字段中的行(例如:字段1、3和5都匹配)

Based on that problem, you can build a set of the "bits that match" from the first list, and then just pull items from the second that match anything in that set, for example, the following filters based on the first and last items in each tuple: 基于该问题,您可以从第一个列表构建一组“匹配的位”,然后从第二个列表中提取与该组中的任何内容匹配的项目,例如,以下基于第一个和最后一个项目的过滤器在每个元组中:

x = [(1,2,3),(2,3,4),(3,4,5),(4,5,6)]
y = [(2,9,4),(4,9,6),(7,8,9),(1,1,1)]

set_x = set((i[0], i[2]) for i in x) # Set of unique keys in x

for item in y:
    if (item[0], item[2]) in set_x:
         pass # Do some importing here

I've included my original answer below, just in case it's of use. 为了以防万一,请在下面提供原始答案。


Based on the assumption that the problem is: 基于以下假设:

  • There are two iterables of tuples 元组有两个可迭代项
  • "Matching" tuples from each list contain all the same values, but in different orders (ie: the fields are in different orders, but there are the same number of values, and all of those values need to match) 每个列表中的“匹配”元组包含所有相同的值,但是顺序不同(即:字段的顺序不同,但是值的数目相同,并且所有这些值都必须匹配)
  • You want a new list of tuples to import where the values exist in both of the original lists? 您要导入一个新的元组列表,两个原始列表中都存在值吗?

One option would be to convert the lists of tuples into sets of (frozen) sets, and then simply pull the intersection - ie: All sets that exist in both sets. 一种选择是将元组列表转换为(冻结的)集合集,然后简单地拉交点-即:两个集合中都存在的所有集合。 For example: 例如:

x = [(1,2,3),(2,3,4),(3,4,5),(4,5,6)]
y = [(2,3,4),(4,5,6),(7,8,9),(1,1,1)]
set_x = set(frozenset(i) for i in x)
set_y = set(frozenset(i) for i in y)
matches = set_x.intersection(set_y)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM