简体   繁体   English

SQLite比较查询Python

[英]SQLite compare query Python

I've been trying to figure out the best way to write a query to compare the rows in two tables. 我一直在尝试找出编写查询以比较两个表中的行的最佳方法。 My goal is to see if the two tuples in result Set A are in the larger result set B. I only want to see the tuples that are different in the query results. 我的目标是查看结果集A中的两个元组是否在较大的结果集B中。我只想查看查询结果中不同的元组。

'''SELECT table1.field_b, table1.field_c, table1.field_d
'''FROM table1
'''ORDER BY field_b

results_a = [(101010101, 111111111, 999999999), (121212121, 222222222, 999999999)]

'''SELECT table2.field_a, table2.fieldb, table3.field3
'''FROM table2
'''ORDER BY field_a

results_b =[(101010101, 111111111, 999999999), (121212121, 333333333, 999999999),    (303030303, 444444444, 999999999)]

So what I want to do is take results_a and make sure that they have an exact match somewhere in results_b. 因此,我要执行的操作是使用results_a并确保它们在results_b中的某个位置完全匹配。 So since the second record in the second tuple is different than what is in results_a, I would like to return the second tuple in results_a. 因此,由于第二个元组中的第二个记录与results_a中的第二个记录不同,因此我想在results_a中返回第二个元组。

Ultimately I would like to return a set that also has the second tuple that did not match in the other set so I could reference both in my program. 最终,我想返回一个集合,该集合还具有另一个集合中不匹配的第二个元组,因此我可以在程序中引用两者。 Ideally since the second tuples primary key (field_b in table1) didn't match the corresponding primary key (field_a) in table2 then I would want to display results_c ={(121212121, 222222222, 999999999):(121212121, 222222222, 999999999)}. 理想情况下,由于第二个元组主键(表1中的field_b)与表2中的对应主键(field_a)不匹配,因此我想显示results_c = {(121212121,222222222,999999999):( 121212121,222222222,999999999)} 。 This is complicated by the facts that the results in both tables will not be in the same order so I can't write code that says (compare tuple2 in results_a to tuple2 in results_b). 由于两个表中的结果的顺序不同,因此这使事实变得复杂,因此我无法编写这样的代码(将results_a中的tuple2与results_b中的tuple2进行比较)。 It is more like (compare tuple2 in results_a and see if it matches any record in results_b. If the primary keys match and none of the tuples in results b completely match or no partial match is found return the records that don't match.) 它更像是(比较results_a中的tuple2并查看它是否与results_b中的任何记录相匹配。如果主键匹配并且结果b中的所有元组都不完全匹配或未找到部分匹配,则返回不匹配的记录。)

I apologize that this is so wordy. 我很抱歉,这太罗word了。 I couldn't think of a better way to explain it. 我想不出更好的方法来解释它。 Any help would be much appreciated. 任何帮助将非常感激。

Thanks! 谢谢!

UPDATED EFFORT ON PARTIAL MATCHES 对部分比赛进行了更新

a = [(1, 2, 3),(4,5,7)]
b = [(1, 2, 3),(4,5,6)]
pmatch = dict([])

def partial_match(x,y):
    return sum(ea == eb for (ea,eb) in zip(x,y))>=2

for el_a in a:
    pmatch[el_a] = [el_b for el_b in b if partial_match(el_a,el_b)]
print(pmatch)

OUTPUT = {(4, 5, 7): [(4, 5, 6)], (1, 2, 3): [(1, 2, 3)]}. 输出= {(4,5,7):[(4,5,6)],(1,2,3):[(1,2,3)]}。 I would have expected it to be just {(4,5,7):(4,5,6)} because those are the only sets that are different. 我本来希望它只是{(4,5,7):( 4,5,6)},因为这些是唯一不同的集合。 Any ideas? 有任何想法吗?

Take results_a and make sure that they have an exact match somewhere in results_b: 以results_a并确保它们在results_b的某处完全匹配:

for el in results_a:
  if el in results_b:
     ...

Get partial matches: 获取部分匹配:

pmatch = dict([])
def partial_match(a,b):
  # for instance ...
  return sum(ea == eb for (ea,eb) in zip(a,b)) >= 2
for el_a in results_a:
  pmatch[el_a] = [el_b for el_b in results_b if partial_macth(el_a,el_b)]

Return the records that don't match: 返回不匹配的记录:

no_match = [el for el in results_a if el not in results_b]

-- EDIT / Another possible partial_match -编辑/另一个可能的partial_match

def partial_match(x,y):
  nb_matches = sum(ea == eb for (ea,eb) in zip(x,y))
  return 0.6 < float(nb_matches) / len(x) < 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM