简体   繁体   English

熊猫:将每列汇总到一个用逗号分隔的列表中,没有重复项

[英]Pandas: Aggregate each column into a comma separated list without duplicates

Problem: 问题:

I have a large CSV file which looks something like this: 我有一个很大的CSV文件,看起来像这样:

A  B   C     D    ...
1  dog black NULL ...
1  dog white NULL ...
1  dog black NULL ...
2  cat red   NULL ...
...

Now I want to "group by" column A and aggregate each remaining column to a comma separated list without duplicates. 现在,我想对“ A ”列进行“分组”,并将其余各列聚合到一个逗号分隔的列表中,而不重复。 The solutions should look something like this: 解决方案应如下所示:

A  B   C             D    ...
1  dog black, white  NULL ...
2  cat red           NULL ...
...

Since the names and numbers of columns in the CSV may change, I prefer a solution without hard coded names. 由于CSV中的名称和列数可能会发生变化,因此我更喜欢没有硬编码名称的解决方案。

Used Approach: 二手方法:

I tried the package pandas with the following code: 我用以下代码尝试了pandas软件包:

import pandas as pd
data = pd.read_csv("C://input.csv", sep=';')
data = data.where((pd.notnull(data)), None)
data_group = data.groupby(['A']).agg(lambda x: set(x))
data_group.to_csv("C://result.csv", sep=';')

The set operator does exactly what I want. set运算符恰好满足了我的要求。 However, the resulting CSV looks like this: 但是,生成的CSV如下所示:

A  B       C                   D      ...
1  {'dog'} {'black', 'white'}  {None} ...
2  {'cat'} {'red'}             {None} ...
...

I don't want the {} and '' in my export and also column D should be empty and not containing the word None . 我不希望在导出中使用{}'' ,并且D列也应该为空并且不包含单词None

Question: 题:

Am I on the right track, or is there a much more elegant way to achieve my goal? 我是在正确的道路上,还是有一种更优雅的方法来实现自己的目标?

join the set with comma: 用逗号join集合:

df.groupby('A', as_index=False).agg(lambda x: ', '.join(set(x.dropna())))

#   A    B             C D
#0  1  dog  white, black  
#1  2  cat           red  

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何为熊猫中的列中的每个逗号分隔值创建一个新行 - How to create a new row for each comma separated value in a column in pandas 如何在Pandas列中拆分逗号分隔的单词列表? - How can I split a list of comma separated words in a Pandas column? Pandas groupby 通过在另一列中的每个逗号分隔值得到一列的总和 - Pandas groupby to get a total of a column by each comma separated value in another column 使用 pandas 以逗号分隔和交换值删除和提取重复项 - drop and extract duplicates in a comma separated & swapped values with pandas 将 Pandas 数据框列的所有行转换为逗号分隔的值,每个值都用单引号 - Convert all rows of a Pandas dataframe column to comma-separated values with each value in single quote 包含对象列表的pandas列,根据键名拆分此列,并将值存储为逗号分隔的值 - pandas column containing list of objects, split this column based upon keynames and store values as comma separated values pandas:根据列表和另一列条件替换逗号分隔列中的相应值 - pandas: replace corresponding values in a comma separated column based on a list and another column conditions Python将逗号分隔列表转换为pandas数据帧 - Python convert comma separated list to pandas dataframe Pandas:逗号分隔的 Excel 单元格未转换为列表 - Pandas: Comma Separated Excel Cells not Converting to List pandas - 在一列中删除重复项,计算重复项的数量并聚合一列 - pandas - drop duplicates in a column, count the number of duplicates and aggregate one column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM