[英]Group by - Dataframe with lists
I need some help with Pandas..我需要一些关于 Pandas 的帮助。
I have a Dataframe which I want to group by the ID column (that works so far).我有一个 Dataframe 我想按 ID 列分组(到目前为止有效)。 The Tags column can contain lists with different amounts Elements and also empty lists.
标签列可以包含具有不同数量元素的列表,也可以包含空列表。
g = data_lemmatized.groupby('ID')['Tags'].apply(lambda x: list(np.unique(x)))
This is the original dataframe:这是原来的dataframe:
With the code I used, I'm recieving the following result:使用我使用的代码,我收到以下结果:
What I would like to have in the new dataframe is:我想在新的 dataframe 中拥有的是:
-a single list with no sub-lists inside, just with the elements or empty - 一个列表,里面没有子列表,只有元素或为空
-no duplicates within the lists (a set of each grouped list) - 列表中没有重复项(每个分组列表的集合)
Example:例子:
0 -> []
1 -> []
2 -> [DTU]
Can someone help me please?有人能帮助我吗?
Try this code.试试这个代码。
import pandas as pd
data_lemmatized = pd.DataFrame({"ID":[0, 1, 2, 2, 2],
"Tags": [[], [], ['DTU'], [], []]})
data_lemmatized.groupby('ID')['Tags'].sum().apply(set).apply(list)
Here, sum of list returns concatenation of lists.在这里,列表总和返回列表的串联。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.