[英]I have a pandas column with lists. Group rows that contains atleast one common element from same column
I have pandas df with 1 column with lists.我有 pandas df 与 1 列与列表。 I would like to group all lists which have at least one element in common.
我想对至少有一个共同元素的所有列表进行分组。
Input Df :
>
Category
0 [IAB19, IAB81, IAB82]
1 [IAB25, IAB27]
2 [IAB19, IAB20]
3 [IAB22, IAB55, IAB56, IAB58]
4 [IAB81, IAB89]
5 [IAB82, IAB95]
i want to find out if any codes in df['Category'] is present in any other row of df['Category'].我想知道 df['Category'] 中的任何代码是否存在于 df['Category'] 的任何其他行中。
And if yes, I would want to merge those lists sharing at least 1 common element.如果是的话,我想合并那些共享至少 1 个共同元素的列表。
Expected Output:预期 Output:
Category
0 [IAB19, IAB81, IAB82, IAB20, IAB89, IAB95]
1 [IAB25, IAB27]
2 [IAB22, IAB55, IAB56, IAB58]
Any thoughts?有什么想法吗?
This is a hidden network problem, so we can try networkx
, but before that you may need to explode
the whole list columns to single row item (function available after pandas
0.25
)这是一个隐藏的网络问题,所以我们可以尝试
networkx
,但在此之前您可能需要将整个列表列pandas
explode
0.25
可用的功能)
import networkx as nx
df['Key']=df.index
df=df.explode('Category')
G=nx.from_pandas_edgelist(df, 'Category', 'Key')
l=list(nx.connected_components(G))
L=[dict.fromkeys(y,x) for x, y in enumerate(l)]
d={k: v for d in L for k, v in d.items()}
s=df.groupby(df.Key.map(d)).Category.apply(set)
s
Key
0 {IAB89, IAB82, IAB19, IAB95, IAB81, IAB20}
1 {IAB27, IAB25}
2 {IAB55, IAB56, IAB22, IAB58}
Name: Category, dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.