繁体   English   中英

使用唯一组合对数据进行分组

[英]grouping data using unique combinations

在我下面的数据集中,我需要找到唯一的序列并为它们分配一个序列号..

数据集:

user    age maritalstatus   product
A   Young   married 111
B   young   married 222
C   young   Single  111
D   old single  222
E   old married 111
F   teen    married 222
G   teen    married 555
H   adult   single  444
I   adult   single  333

唯一序列:

young   married     0
young   single      1
old     single      2
old     married     3
teen    married     4
adult   single      5

找到如上所示的唯一值后,如果我传递如下所示的数据框,则 newdataframe

user    age maritalstatus  
A      Young   married 
X      young   Single  
D      old     single  
Z      old     married

它应该将产品作为列表返回给我。

A: [222] - as user A has already purchased 111, the matching sequence contains 222, so returns 222.
X: [111, 222]
D: [] - returns nothing, as there is only one sequence like this, and D has already purchased the product 222, so returns empty.
Z: [111] matches with sequence E, so returned 111

如果没有序列,如下所示

user     age     maritalstatus  
    Y     adult  married

它应该给我一个空列表

 Y : []

您可以使用集合 - 模块提供用于构造和操作唯一元素的无序集合的类

看看: https : //docs.python.org/2/library/sets.html

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM