分组 - Dataframe 与列表

Question

I need some help with Pandas..我需要一些关于 Pandas 的帮助。

I have a Dataframe which I want to group by the ID column (that works so far).我有一个 Dataframe 我想按 ID 列分组（到目前为止有效）。 The Tags column can contain lists with different amounts Elements and also empty lists.标签列可以包含具有不同数量元素的列表，也可以包含空列表。

g = data_lemmatized.groupby('ID')['Tags'].apply(lambda x: list(np.unique(x)))

This is the original dataframe:这是原来的dataframe：

With the code I used, I'm recieving the following result:使用我使用的代码，我收到以下结果：

What I would like to have in the new dataframe is:我想在新的 dataframe 中拥有的是：

-a single list with no sub-lists inside, just with the elements or empty - 一个列表，里面没有子列表，只有元素或为空

-no duplicates within the lists (a set of each grouped list) - 列表中没有重复项（每个分组列表的集合）

Example:例子：

0 -> []
1 -> []
2 -> [DTU]

Can someone help me please?有人能帮助我吗？

Answer 1

Try this code.试试这个代码。

import pandas as pd
data_lemmatized = pd.DataFrame({"ID":[0, 1, 2, 2, 2],
                                "Tags": [[], [], ['DTU'], [], []]})

data_lemmatized.groupby('ID')['Tags'].sum().apply(set).apply(list)

Here, sum of list returns concatenation of lists.在这里，列表总和返回列表的串联。

分组 - Dataframe 与列表

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-05-14 01:22:10

分组 - Dataframe 与列表

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-05-14 01:22:10

解决方案1
1 已采纳 2020-05-14 01:22:10