共享公共元素的链接列表

Question

I have an issue similar to this one with a few differences/complications我有一个与此类似的问题，但有一些差异/并发症

I have a list of groups containing members, rather than merging the groups that share members I need to preserve the groupings and create a new set of edges based on which groups have members in common, and do so conditionally based on attributes of the groups我有一个包含成员的组列表，而不是合并共享成员的组，我需要保留分组并根据哪些组具有共同成员创建一组新的边，并根据组的属性有条件地这样做

The source data looks like this:源数据如下所示：

+----------+------------+-----------+
| Group ID | Group Type | Member ID |
+----------+------------+-----------+
| A        | Type 1     |         1 |
| A        | Type 1     |         2 |
| B        | Type 1     |         2 |
| B        | Type 1     |         3 |
| C        | Type 1     |         3 |
| C        | Type 1     |         4 |
| D        | Type 2     |         4 |
| D        | Type 2     |         5 |
+----------+------------+-----------+

Desired output is this:期望的输出是这样的：

+----------+-----------------+
| Group ID | Linked Group ID |
+----------+-----------------+
| A        | B               |
| B        | C               |
+----------+-----------------+

A is linked to B because it shares 2 in common B is linked to C because it shares 3 in common C is not linked to D, it has a member in common but is of a different type A 链接到 B，因为它共享 2 个 B 链接到 C，因为它共享 3 个 C不链接到 D，它有一个共同的成员但属于不同的类型

The number of shared members doesn't matter for my purposes, a single member in common means they're linked共享成员的数量与我的目的无关，一个共同的成员意味着他们是链接的

The output is being used as the edges of a graph, so if the output is a graph that fits the rules that's fine输出被用作图的边，所以如果输出是符合规则的图就好了

The source dataset is large (hundreds of millions of rows), so performance is a consideration源数据集很大（数亿行），因此需要考虑性能

This poses a similar question, however I'm new to Python and can't figure out how to get the source data to a point where I can use the answer, or work in the additional requirement of the group type matching 这提出了一个类似的问题，但是我是 Python 新手，无法弄清楚如何将源数据获取到可以使用答案的程度，或者在组类型匹配的附加要求中工作

Answer 1

Try some thing like this-尝试这样的事情 -

df1=df.groupby(['Group Type','Member ID'])['Group ID'].apply(','.join).reset_index()
df2=df1[df1['Group ID'].str.contains(",")]

This might not handle the case of cyclic grouping.这可能无法处理循环分组的情况。

共享公共元素的链接列表

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-08-25 18:18:25

共享公共元素的链接列表

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-08-25 18:18:25

解决方案1
1 已采纳 2020-08-25 18:18:25