简体   繁体   English

在多列上将具有相同列值的行分组

[英]Grouping rows with same column value on multiple columns

I need to find a way to group rows that have the same value in a column, but with rows being grouped on multiple columns. 我需要找到一种方法来对一列中具有相同值的行进行分组,但是将行分组到多个列中。 What I need to achieve is grouping the rows that represent a single object having different IDs on different services. 我需要实现的是对代表单个对象的行进行分组,这些对象在不同的​​服务上具有不同的ID。

I have a pandas dataframe that looks like 我有一个看起来像的熊猫数据框

SERV1 SERV2 SERV3 SERV4 SERV5 SERV6
8766  NaN   NaN   0989  NaN   NaN   
8766  NaN   5434  NaN   NaN   NaN   
NaN   NaN   5434  3212  NaN   NaN   
NaN   1236  NaN   NaN   NaN   6543
NaN   3456  NaN   7862  NaN   NaN   
NaN   NaN   NaN   7862  NaN   4767

And the desired dataframe should look like 所需的数据框应该看起来像

SERV1   SERV2   SERV3   SERV4         SERV5   SERV6
[8766]  NaN     [5434]  [0989,3212]   NaN     NaN
NaN     [1236]  NaN     NaN           NaN     [6543]   
NaN     [3456]  NaN    [7862]         NaN     [4767]

Columns represent the different services, values represent an ID that is univoque only for that specific column (same value on different columns, may happen by accident, but should not be considered as representing the same ID). 列代表不同的服务,值代表仅对于该特定列而言唯一的ID(不同列上的相同值,可能会偶然发生,但不应视为代表相同的ID)。

I managed to create a dictionary for each column with the corresponding values, but that is not the same as having a df as the one I'd like. 我设法为每一列创建一个具有相应值的字典,但这与我想要的df不同。

By using 通过使用

df = grouped.aggregate(lambda x: tuple(x))

I could achieve a similar thing but that works only for grouping a single column and not to link it to the others, it puts together all the NaN which don't actually belong together. 我可以实现类似的功能,但它仅适用于将单个列分组,而不是将其链接到其他列,而是将所有实际上不属于一起的NaN放在一起。

I'm looking for ideas/solutions. 我正在寻找想法/解决方案。 Thanks. 谢谢。

Not having found a completely pandas solution, I have resolved to do it by using networkx module and extracting subgraph with connected_component_subgraphs function, and then unpacking the results into a dataframe. 尚未找到一个完整的熊猫解决方案,我已解决此问题,方法是使用networkx模块,并使用connected_component_subgraphs函数提取子图,然后将结果拆包到数据框中。 Not that elegant, but it works. 不是那么优雅,但它可以工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM