简体   繁体   English

我正在尝试对 pandas 聚合进行排序

[英]I'm Trying to sort pandas aggregation

I'm trying to aggregate and sort data from my dataset, but I don't know how to do.我正在尝试从我的数据集中汇总和排序数据,但我不知道该怎么做。 Can someone help me?有人能帮我吗?

data = {'message_id':  ['1', '1', '1', '1', '2', '2', '2'],
        'to': ['one', 'two', 'three', 'four', 'five', 'six', 'five'],
        'idt': ['1','2','3','4','5','6','5']
        }

df = pd.DataFrame(data, columns = ['message_id','to','idt'])

agg_func_text = {'to': [ set], 'idt': [ set]}

df.sort_values(by=['message_id', 'to'])

df3=df.groupby(['message_id']).agg(agg_func_text)

as result:结果:

message_id  to set              idt set
1       {four, three, one, two}     {2, 3, 1, 4}
2       {five, six}         {5, 6}

but I would like to recevied this as result:但我想收到这个结果:

message_id  to set              idt set
1       {one, two, three, four}     {1, 2, 3, 4}
2       {five, six}         {5, 6}

In python set is not defined order, so cannot sorting or change ordering there, possible soution is use dict.fromkeys().keys() trick for remove duplicates and output is tuple (which should be sorted and there is also defined order):python set未定义顺序,因此无法在此处排序或更改顺序,可能的解决方案是使用dict.fromkeys().keys()技巧来删除重复项,并且 output 是tuple (应该排序并且还定义了顺序):

f = lambda x: dict.fromkeys(x).keys()
agg_func_text = {'to': f, 'idt': f}

#if need sorting assign back
df = df.sort_values(by=['message_id', 'idt'])

df3=df.groupby('message_id').agg(agg_func_text)

print (df3)
                                 to           idt
message_id                                       
1           (one, two, three, four)  (1, 2, 3, 4)
2                       (five, six)        (5, 6)

sort using a number in a dictionary, save the results, then translate from the number back to a character representation of the number.使用字典中的数字进行排序,保存结果,然后从数字转换回数字的字符表示。

 data = {'message_id':  ['1', '1', '1', '1', '2', '2', '2'],
    'to': ['one', 'two', 'three', 'four', 'five', 'six', 'five'],
    'idt': ['1','2','3','4','5','6','5']
    }

 df = pd.DataFrame(data, columns = ['message_id','to','idt'])
 print(df)
 agg_func_text = {'to': set, 'idt': set}
 df.sort_values(by=['message_id', 'to'])

 grouped=df.groupby(['message_id']).agg(agg_func_text)

 grouped['idt']=grouped['idt'].apply(lambda x: sorted(x))
 dct={'one':1, 'two':2,'three':3,'four':4,'five':5,'six':6,'seven':7,'eight':8,'nine':9}
 dct2={1: 'one',2:'two',3:'three',4:'four',5:'five',6:'six',7:'seven',8:'eight',9:'nine'}
 grouped['to']=grouped['to'].apply(lambda x: sorted([dct[item] for item in x]))
 grouped['to']=grouped['to'].apply(lambda x: [dct2[item] for item in x])
 print(grouped)

 output:
                          to           idt
 message_id                                       
 1           [one, two, three, four]  [1, 2, 3, 4]
 2                       [five, six]        [5, 6]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM