简体   繁体   English

想要使用groupby和transform创建包含唯一值列表的列

[英]Want to create column with lists of unique values using groupby and transform

Here is a sample dataset 这是一个样本数据集

test = pd.DataFrame({
    'a' : [1, 2, 3]*2,
    'b' : ['a', 'a', 'b', 'b', 'b', 'b',],
    'c' : [123, 456, 456, 123, 456, 123]
})

print(test)

   a  b    c
0  1  a  123
1  2  a  456
2  3  b  456
3  1  b  123
4  2  b  456
5  3  b  123

If I groupby columns 'a' and 'b' and then try to get a list of unique values ( 'c' ) in each group, I don't get the expected results using transform 如果我groupby'a''b' ,然后尝试获得唯一值(名单'c' )各组,预期结果使用我没有得到transform

# using transform
print(test.groupby([
    'a',
    'b',
]).c.transform(pd.Series.unique))

0    123
1    456
2    456
3    123
4    456
5    123

If I use unique instead, I almost get the expected output: 如果我改用unique ,我几乎可以得到预期的输出:

# almost expected output
print(test.groupby([
    'a',
    'b',
]).c.unique())

a  b
1  a         [123]
   b         [123]
2  a         [456]
   b         [456]
3  b    [456, 123]
Name: c, dtype: object

What I was hoping for was a pd.Series that looks like this using transform : 我所希望的是使用transform看起来像这样的pd.Series

Expected Output 预期产量

0         [123]
1         [456]
2    [456, 123]
3         [123]
4         [456]
5    [456, 123]
dtype: object

I know that I can use transform to get the nunique values of 'c' as a series doing this: 我知道我可以使用transform来获得'c'nunique值,作为一系列这样做:

print(test.groupby([
    'a',
    'b',
]).c.transform(pd.Series.nunique))

0    1
1    1
2    2
3    1
4    1
5    2
Name: c, dtype: int64

Question

Why can't I do something similar with unique and transform ? 为什么我不能对uniquetransform做类似的事情?

Side Note 边注

I know that I can do the groupby and unique and then reset_index and merge with the original data, but I'm hoping for a more pythonic/pandas-friendly method. 我知道我可以进行groupbyunique ,然后进行reset_index并与原始数据merge ,但是我希望有一个对pythonic / pandas更友好的方法。

I also tried using set and transform , but that returned an error. 我也尝试使用settransform ,但是返回了错误。

print(test.groupby([
    'a',
    'b',
]).c.transform(set))

TypeError: 'set' type is unordered

Does 是否

test.groupby(['a','b'])['c'].transform('unique')

work for you? 为你工作?

Output: 输出:

0         [123]
1         [456]
2    [456, 123]
3         [123]
4         [456]
5    [456, 123]
Name: c, dtype: object

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM