如何在pyhton中有效地创建一组集合？

Question

I have two dataframes two dataframes, with two columns.我有两个数据框两个数据框，有两列。 The rows are value pairs, where order is not important: ab == ba for me.行是值对，其中顺序并不重要：对我来说是 ab == ba。 I need to compare these value pairs between the two dataframes.我需要比较两个数据帧之间的这些值对。 I have a solution, but that is terribly slow for a dataframe with 300k我有一个解决方案，但是对于 300k 的数据帧来说这非常慢

import pandas as pd

df1 = pd.DataFrame({"col1" : [1,2,3,4], "col2":[2,1,5,6]})
df2 = pd.DataFrame({"col1" : [2,1,3,4], "col2":[1,9,8,9]})

mysets = [{x[0],x[1]} for x in df1.values.tolist()]
df1sets = []
for element in mysets:
    if element not in df1sets:
        df1sets.append(element)
           
mysets = [{x[0],x[1]} for x in df2.values.tolist()]
df2sets = []
for element in mysets:
    if element not in df2sets:
        df2sets.append(element)

intersect_sets = [x for x in df1sets if x in df2sets]

this works, but it is terribly slow, and there must be an easier way to do this.这有效，但速度非常慢，必须有更简单的方法来做到这一点。 One of my problem is that is that I cannot add a set to a set, I cannot create {{1,2}, {2,3}} etc我的一个问题是我不能将一个集合添加到一个集合中，我不能创建 {{1,2}, {2,3}} 等

Answer 1

Pandas solution is merge with sorted values of columns, remove duplicates and convert to sets: Pandas 解决方案是与列的排序值合并，删除重复项并转换为集合：

intersect_sets = ([set(x) for x in pd.DataFrame(np.sort(df1.to_numpy(), axis=1))
                        .merge(pd.DataFrame(np.sort(df2.to_numpy(), axis=1)))
                        .drop_duplicates()
                        .to_numpy()])
       
print (intersect_sets)
[{1, 2}]

Another idea with set of frozensets:一组frozensets的另一个想法：

intersect_sets = (set([frozenset(x) for x in df1.to_numpy()]) & 
                  set([frozenset(x) for x in df2.to_numpy()]))
print (intersect_sets)
{frozenset({1, 2})}

如何在pyhton中有效地创建一组集合？

问题描述

1 个解决方案

解决方案1
0 2020-11-17 12:21:22

如何在pyhton中有效地创建一组集合？

问题描述

1 个解决方案

解决方案1 0 2020-11-17 12:21:22

解决方案1
0 2020-11-17 12:21:22