Pandas groupby 到新栏目

Question

I have a dataframe with columns code and images .我有一个 dataframe 列code和images 。

Column images is a string of urls joined by a comma: <URL>,<URL2>,...列images是由逗号连接的urls字符串： <URL>,<URL2>,...

Column code is NOT unique and I need to make it unique but store all images (from all variants) in a new column images_all .列code不是唯一的，我需要使其唯一，但将所有图像（来自所有变体）存储在新列images_all中。

For example:例如：

code something images
1    x         url1,url2,url3
1    x         url1,url4

Result is: code something images_all 1 x url1,url2,url3,url4结果是：编码一些 images_all 1 x url1,url2,url3,url4

I did我做了

grouped = csv.groupby('code')
csv = csv.drop_duplicates(subset=['code'], keep='last')
csv['images_all'] = csv.apply(lambda r:  list(set(
    [image for image in grouped.get_group(r['code'])['images']]
)))

which raises:这引发了：

KeyError: 'code'

But even if it didn't raise this, the problem is that images wouldn't be [url1,url2,url3,url4] .但即使它没有提出这个问题，问题是图像不会是[url1,url2,url3,url4] 。 Instead, it would be ["url1,url2,url3","url1,url4"] .相反，它将是["url1,url2,url3","url1,url4"] 。

Do you know how to fix it?你知道如何解决吗？

EDIT编辑

I also want to keep other columns (they are the same for all rows with the same code, that's why I then just drop_duplicates and keep the last row)我还想保留其他列（对于具有相同代码的所有行，它们都是相同的，这就是为什么我然后只是 drop_duplicates 并保留最后一行）

Answer 1

Use GroupBy.transform with custom function for flatten splitted values, then converted to sets and last join unique values:将GroupBy.transform与自定义 function 一起使用以展平拆分值，然后转换为集合并最后join唯一值：

f = lambda x: ','.join(set([z for y in x for z in y.split(',')]))
df['images_all'] = df.groupby('code')['images'].transform(f)
print (df)
   code something          images           images_all
0     1         x  url1,url2,url3  url1,url3,url2,url4
1     1         x       url1,url4  url1,url3,url2,url4

Pandas groupby 到新栏目

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-07-09 13:06:49

Pandas groupby 到新栏目

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-07-09 13:06:49

解决方案1
1 已采纳 2020-07-09 13:06:49