从列表集合中查找最相似的列表，在线

Question

我有一组列表，每个列表都代表图表上的一条路径：

list1 = [1,2,3,6] # directed edge from 1->2, 2->3, 3->6
list2 = [8,3,5,6]
list3 = [9,1,3,4]
list4 = [7,8,1,4]

我也有图的邻接矩阵。

在每个时间步我都有一个优势：例如时间步 0: [1,2] ，时间步 1: [3,6] ，并且在每个时间步我必须找到最相似的列表，考虑到以前的时间脚步。 意思是，最完整的列表。

什么是有效的方法？

我尝试使用一种简单的方法，将传入边缘与每个列表中的每个边缘进行比较，但考虑到我有大量列表，每个列表都有大量边缘，这太慢了。

更新：在每个时间步写一个示例输入和 output。

时间步 0: 输入[1,2] , output: list1

time step 1: input [8,3] , output: list1, list2 #equally complete

时间步 2: 输入[3,6] , output: list1

更新 2：感谢@Nuclearman，我编写了解决方案（也许是天真？）

list1 = [1,2,3,6] # directed edge from 1->2, 2->3, 3->6
list2 = [8,3,5,6]
list3 = [9,1,3,4]
list4 = [7,8,1,4]

lists_dict = {
    'list1' : list1,
    'list2' : list2,
    'list3' : list3,
    'list4' : list4
}


edges = {
    'list1' : len(list1) - 1,
    'list2' : len(list2) - 1,
    'list3' : len(list3) - 1,
    'list4' : len(list4) - 1
}

covered_edges = {
    'list1' : 0,
    'list2' : 0,
    'list3' : 0,
    'list4' : 0
}

completeness = {
    'list1' : covered_edges['list1']/edges['list1'],
    'list2' : covered_edges['list2']/edges['list2'],
    'list3' : covered_edges['list3']/edges['list3'],
    'list4' : covered_edges['list4']/edges['list4']
}

graph = {}

for list_name in lists_dict.keys():
    idx = 0
    
    while idx < len(lists_dict[list_name])-1:
        
        node1 = lists_dict[list_name][idx]
        node2 = lists_dict[list_name][idx+1]

        if node1 in graph.keys(): #if exist
            graph[node1][node2] =  list_name
            
        else: #if doesnt exist
            graph[node1] = {node2: list_name}
        
        idx+=1
        
times= [[1,2],[3,5],[5,6],[8,1],[7,8]]
for time in times:
    edge_in_list = graph[time[0]][time[1]] #list name

    covered_edges[edge_in_list] +=1
    print(covered_edges)
    
    completeness = {
    'list1' : covered_edges['list1']/edges['list1'],
    'list2' : covered_edges['list2']/edges['list2'],
    'list3' : covered_edges['list3']/edges['list3'],
    'list4' : covered_edges['list4']/edges['list4']
    }
    
    mx = max(completeness.values())
    max_list = [k for k, v in completeness.items() if v == mx]
    
    print(max_list)
    print('')

Answer 1

尝试使用邻接列表设置作为嵌套 hash 来表示图形

IE：您可以通过这种方式设置整个示例（不记得这是否是有效的 python）：

graph = {
  1: {2: [1], 3: [3], 4: [4] },
  2: {3: [1] },
  3: {6: [1], 5: [2], 4: [3] },
  5: {6: [2] },
  7: {8: [4] },
  8: {3: [2], 1: [4] },
  9: {1: [3] },
}

然后，您只需记录每个列表中剩余的数量，并将其存储到具有O(log N)或更好的 find-min（或 find-max 只需调整键）的数据结构中，查找、添加项目和删除项目。 根据您如何定义完整性，您可能需要做一些数学运算。 IE：您可能需要存储总边和覆盖边，然后使用[(total - covered) / total, list #]或作为数据结构的键。 这样，即使有多个具有相同完整性的列表，您也可以快速找到该列表。 对于您想要的结果，返回所有具有最高完整性的列表。

上图让您快速确定每条边进入哪个列表，然后在剩余计数中查找该边，并将每个列表的计数减一。 IE：您可以看到graph[1][2]是列表 1， graph[8][3]是列表 2， graph[3][6]也是列表 1。

为了性能，您可能还希望保留一组已经看到的边缘并跳过任何已经看到的边缘，尽管这可能需要也可能不需要，并且可能会或可能不会是您想要处理它的方式。

性能与您需要更改的列表数量成正比，使其对 output 敏感。 如果提供的示例是 go 上的任何内容，那么与列表数量相比，您需要为每个传入边更新的列表计数数量可能非常小。 如果在所有L个列表中总共有E个边并且您需要在线处理K个边并且这些K边导致处理总共A个列表（ A是一个 output 敏感变量，并且取决于列表之间有多少重叠，例如您给出的重叠为零，因为每个边缘都有一个与之关联的列表，但不清楚是否会保留更多列表和边缘）。 那么性能是O(E + K + AlogL)我相信，因为那些A处理的列表每个都需要一个log L查找来查找 + 更新列表计数。 E是构建图所需的预处理。 这似乎基本上是最优的，除非有别的东西。 可能比您目前拥有的O(K*E)好得多，除非您有极高的重叠 ( A )。

从列表集合中查找最相似的列表，在线

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-12-31 23:33:27

从列表集合中查找最相似的列表，在线

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-12-31 23:33:27

解决方案1
1 已采纳 2020-12-31 23:33:27