[英]How to compare given set with available sets to find the one with most intersecting elements, when there are a million fields in total?
Available sets are 可用的集是
A={"one","two","three"}
B={"two","three","four"}
c={"four","five"}
Given set is 给定的是
D = {"four","five","six"}
The task is to find which available set has most intersecting elements to given set. 任务是找到哪个可用集合与给定集合具有最多的相交元素。
Here 这里
C contains 2 fields of D C包含D的2个字段
B contains 1 field of D. B包含D的1个字段。
This can be computed by finding the union of D with A, B, C. 可以通过找到D与A,B,C的并集来计算。
How to find the most close-set when there are millions of available sets. 当有数百万个可用集合时,如何查找最接近的集合。
Build a data structure in such a way that the elements become the key. 以使元素成为关键的方式构建数据结构。 In your example, the data structure can be built to look like the below 在您的示例中,数据结构可以构建为如下所示
"one": {A}
"two": {A,B}
"three": {A,B}
"four": {B,C}
"five": {C}
Now all you need to check is to take each element in your input set D and add a counter to each of the set names. 现在,您需要检查的是获取输入集合D中的每个元素,并为每个集合名称添加一个计数器。 so in your example, D will be {"four","five","six"} 因此,在您的示例中,D将为{“四个”,“五个”,“六个”}
Now you can loop through "four", "five" and "six" 现在您可以循环浏览“四个”,“五个”和“六个”
Step 1: The counter will be all zeros initially
Step 2: After looking at the values for "four" the counter will look like below
B:1, C:1
Step 3: After looking at the values for "five" the counter will look like below
B:1, C:2
Step 4: After looking at the values for "six" the counter will look like below
B:1, C:2
Step 5: Choose the set with the maximum value. In this case it will be C.
If you are using python, you can use collections.Counter most_common method. 如果使用的是python,则可以使用collections.Counter most_common方法。
https://docs.python.org/3/library/collections.html#collections.Counter https://docs.python.org/3/library/collections.html#collections.Counter
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.