[英]Link lists that share common elements
I have an issue similar to this one with a few differences/complications我有一个与此类似的问题,但有一些差异/并发症
I have a list of groups containing members, rather than merging the groups that share members I need to preserve the groupings and create a new set of edges based on which groups have members in common, and do so conditionally based on attributes of the groups我有一个包含成员的组列表,而不是合并共享成员的组,我需要保留分组并根据哪些组具有共同成员创建一组新的边,并根据组的属性有条件地这样做
The source data looks like this:源数据如下所示:
+----------+------------+-----------+ | Group ID | Group Type | Member ID | +----------+------------+-----------+ | A | Type 1 | 1 | | A | Type 1 | 2 | | B | Type 1 | 2 | | B | Type 1 | 3 | | C | Type 1 | 3 | | C | Type 1 | 4 | | D | Type 2 | 4 | | D | Type 2 | 5 | +----------+------------+-----------+
Desired output is this:期望的输出是这样的:
+----------+-----------------+ | Group ID | Linked Group ID | +----------+-----------------+ | A | B | | B | C | +----------+-----------------+
A is linked to B because it shares 2 in common B is linked to C because it shares 3 in common C is not linked to D, it has a member in common but is of a different type A 链接到 B,因为它共享 2 个 B 链接到 C,因为它共享 3 个 C不链接到 D,它有一个共同的成员但属于不同的类型
The number of shared members doesn't matter for my purposes, a single member in common means they're linked共享成员的数量与我的目的无关,一个共同的成员意味着他们是链接的
The output is being used as the edges of a graph, so if the output is a graph that fits the rules that's fine输出被用作图的边,所以如果输出是符合规则的图就好了
The source dataset is large (hundreds of millions of rows), so performance is a consideration源数据集很大(数亿行),因此需要考虑性能
This poses a similar question, however I'm new to Python and can't figure out how to get the source data to a point where I can use the answer, or work in the additional requirement of the group type matching 这提出了一个类似的问题,但是我是 Python 新手,无法弄清楚如何将源数据获取到可以使用答案的程度,或者在组类型匹配的附加要求中工作
Try some thing like this-尝试这样的事情 -
df1=df.groupby(['Group Type','Member ID'])['Group ID'].apply(','.join).reset_index()
df2=df1[df1['Group ID'].str.contains(",")]
This might not handle the case of cyclic grouping.这可能无法处理循环分组的情况。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.