[英]Pandas groupby to new column
I have a dataframe with columns code
and images
.我有一个 dataframe 列code
和images
。
Column images
is a string of urls
joined by a comma: <URL>,<URL2>,...
列images
是由逗号连接的urls
字符串: <URL>,<URL2>,...
Column code
is NOT unique and I need to make it unique but store all images (from all variants) in a new column images_all
.列code
不是唯一的,我需要使其唯一,但将所有图像(来自所有变体)存储在新列images_all
中。
For example:例如:
code something images
1 x url1,url2,url3
1 x url1,url4
Result is: code something images_all 1 x url1,url2,url3,url4结果是:编码一些 images_all 1 x url1,url2,url3,url4
I did我做了
grouped = csv.groupby('code')
csv = csv.drop_duplicates(subset=['code'], keep='last')
csv['images_all'] = csv.apply(lambda r: list(set(
[image for image in grouped.get_group(r['code'])['images']]
)))
which raises:这引发了:
KeyError: 'code'
But even if it didn't raise this, the problem is that images wouldn't be [url1,url2,url3,url4]
.但即使它没有提出这个问题,问题是图像不会是[url1,url2,url3,url4]
。 Instead, it would be ["url1,url2,url3","url1,url4"]
.相反,它将是["url1,url2,url3","url1,url4"]
。
Do you know how to fix it?你知道如何解决吗?
EDIT编辑
I also want to keep other columns (they are the same for all rows with the same code, that's why I then just drop_duplicates and keep the last row)我还想保留其他列(对于具有相同代码的所有行,它们都是相同的,这就是为什么我然后只是 drop_duplicates 并保留最后一行)
Use GroupBy.transform
with custom function for flatten splitted values, then converted to sets and last join
unique values:将GroupBy.transform
与自定义 function 一起使用以展平拆分值,然后转换为集合并最后join
唯一值:
f = lambda x: ','.join(set([z for y in x for z in y.split(',')]))
df['images_all'] = df.groupby('code')['images'].transform(f)
print (df)
code something images images_all
0 1 x url1,url2,url3 url1,url3,url2,url4
1 1 x url1,url4 url1,url3,url2,url4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.