简体   繁体   English

共享公共元素的链接列表

[英]Link lists that share common elements

I have an issue similar to this one with a few differences/complications我有一个与此类似的问题,但有一些差异/并发症

I have a list of groups containing members, rather than merging the groups that share members I need to preserve the groupings and create a new set of edges based on which groups have members in common, and do so conditionally based on attributes of the groups我有一个包含成员的组列表,而不是合并共享成员的组,我需要保留分组并根据哪些组具有共同成员创建一组新的边,并根据组的属性有条件地这样做

The source data looks like this:源数据如下所示:

+----------+------------+-----------+
| Group ID | Group Type | Member ID |
+----------+------------+-----------+
| A        | Type 1     |         1 |
| A        | Type 1     |         2 |
| B        | Type 1     |         2 |
| B        | Type 1     |         3 |
| C        | Type 1     |         3 |
| C        | Type 1     |         4 |
| D        | Type 2     |         4 |
| D        | Type 2     |         5 |
+----------+------------+-----------+

Desired output is this:期望的输出是这样的:

+----------+-----------------+
| Group ID | Linked Group ID |
+----------+-----------------+
| A        | B               |
| B        | C               |
+----------+-----------------+

A is linked to B because it shares 2 in common B is linked to C because it shares 3 in common C is not linked to D, it has a member in common but is of a different type A 链接到 B,因为它共享 2 个 B 链接到 C,因为它共享 3 个 C链接到 D,它有一个共同的成员但属于不同的类型

The number of shared members doesn't matter for my purposes, a single member in common means they're linked共享成员的数量与我的目的无关,一个共同的成员意味着他们是链接的

The output is being used as the edges of a graph, so if the output is a graph that fits the rules that's fine输出被用作图的边,所以如果输出是符合规则的图就好了

The source dataset is large (hundreds of millions of rows), so performance is a consideration源数据集很大(数亿行),因此需要考虑性能

This poses a similar question, however I'm new to Python and can't figure out how to get the source data to a point where I can use the answer, or work in the additional requirement of the group type matching 提出了一个类似的问题,但是我是 Python 新手,无法弄清楚如何将源数据获取到可以使用答案的程度,或者在组类型匹配的附加要求中工作

Try some thing like this-尝试这样的事情 -

df1=df.groupby(['Group Type','Member ID'])['Group ID'].apply(','.join).reset_index()
df2=df1[df1['Group ID'].str.contains(",")]

This might not handle the case of cyclic grouping.这可能无法处理循环分组的情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM