简体   繁体   English

我有一个带有列表的 pandas 列。 对包含来自同一列的至少一个公共元素的行进行分组

[英]I have a pandas column with lists. Group rows that contains atleast one common element from same column

I have pandas df with 1 column with lists.我有 pandas df 与 1 列与列表。 I would like to group all lists which have at least one element in common.我想对至少有一个共同元素的所有列表进行分组。

Input Df :
> 
    Category
 0  [IAB19, IAB81, IAB82]
 1  [IAB25, IAB27]
 2  [IAB19, IAB20]
 3  [IAB22, IAB55, IAB56, IAB58]
 4  [IAB81, IAB89]
 5  [IAB82, IAB95]

i want to find out if any codes in df['Category'] is present in any other row of df['Category'].我想知道 df['Category'] 中的任何代码是否存在于 df['Category'] 的任何其他行中。

And if yes, I would want to merge those lists sharing at least 1 common element.如果是的话,我想合并那些共享至少 1 个共同元素的列表。

Expected Output:预期 Output:

    Category
 0  [IAB19, IAB81, IAB82, IAB20, IAB89, IAB95]
 1  [IAB25, IAB27]
 2  [IAB22, IAB55, IAB56, IAB58]

Any thoughts?有什么想法吗?

This is a hidden network problem, so we can try networkx , but before that you may need to explode the whole list columns to single row item (function available after pandas 0.25 )这是一个隐藏的网络问题,所以我们可以尝试networkx ,但在此之前您可能需要将整个列表列pandas explode 0.25可用的功能)

import networkx as nx
df['Key']=df.index
df=df.explode('Category')
G=nx.from_pandas_edgelist(df, 'Category', 'Key')
l=list(nx.connected_components(G))
L=[dict.fromkeys(y,x) for x, y in enumerate(l)]
d={k: v for d in L for k, v in d.items()}
s=df.groupby(df.Key.map(d)).Category.apply(set)
s
Key
0    {IAB89, IAB82, IAB19, IAB95, IAB81, IAB20}
1                                {IAB27, IAB25}
2                  {IAB55, IAB56, IAB22, IAB58}
Name: Category, dtype: object

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我有一个带有特定列的列表的所有行。 从目标列表中选择不包含至少一个元素的行 - I have a all the rows with a particular column with lists. Select rows that does not contain atleast one element from the target list Pandas groupby 在具有至少一个共同元素的列表列表中 - Pandas groupby on list of lists with atleast one element common 如果数据框列中至少一个单词与另一个单词重叠,则对行进行分组 - group rows if atleast one word overlaps with other in a dataframe column 如何 select pandas 行在一个列中具有最大值,来自一组共享两个公共列的行? - How to select pandas row with maximum value in one column, from a group of rows that share two common columns? Pandas - 如何按一个数字列分组并按每组的中位数过滤每组的行? - Pandas - How can I group by one numeric column and filter rows from each group by the median of each group? 将列表中的元素与包含列表的列相匹配。 如果找到单个元素,则返回整行 - Matching an element from a list to a column that holds lists. If single element found, return entire row 我有两个清单。 我想从其中一个列表中给出的元素中选出第三个 - I have two lists. I would like to make a third from the elements given in one of the lists Pandas - Dataframe 具有带列表的列。 如何对列表中的元素进行分组? - Pandas - Dataframe has column with lists. How can I groupby the elements within the list? 如何组合 pandas dataframe 中在一列中具有相同值的行 - How to combine rows in a pandas dataframe that have the same value in one column CSV文件列中的对象是列表。 如何使用Python将它们组合成一个巨型列表? - Objects in column of csv file are lists. How do I combine these into one giant list using Python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM