I have pandas df with 1 column with lists. I would like to group all lists which have at least one element in common.
Input Df :
>
Category
0 [IAB19, IAB81, IAB82]
1 [IAB25, IAB27]
2 [IAB19, IAB20]
3 [IAB22, IAB55, IAB56, IAB58]
4 [IAB81, IAB89]
5 [IAB82, IAB95]
i want to find out if any codes in df['Category'] is present in any other row of df['Category'].
And if yes, I would want to merge those lists sharing at least 1 common element.
Expected Output:
Category
0 [IAB19, IAB81, IAB82, IAB20, IAB89, IAB95]
1 [IAB25, IAB27]
2 [IAB22, IAB55, IAB56, IAB58]
Any thoughts?
This is a hidden network problem, so we can try networkx
, but before that you may need to explode
the whole list columns to single row item (function available after pandas
0.25
)
import networkx as nx
df['Key']=df.index
df=df.explode('Category')
G=nx.from_pandas_edgelist(df, 'Category', 'Key')
l=list(nx.connected_components(G))
L=[dict.fromkeys(y,x) for x, y in enumerate(l)]
d={k: v for d in L for k, v in d.items()}
s=df.groupby(df.Key.map(d)).Category.apply(set)
s
Key
0 {IAB89, IAB82, IAB19, IAB95, IAB81, IAB20}
1 {IAB27, IAB25}
2 {IAB55, IAB56, IAB22, IAB58}
Name: Category, dtype: object
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.