为熊猫中的groupby的组分配值

Question

我想根据范围的有效性选择原始数据。 有一台仪器，最敏感的设置是C，然后是B，然后是A。因此，从C开始，查看所有值是否都小于阈值，如果是，则为完美，将此灵敏度下的所有数据设置为best = 1。

from StringIO import StringIO

a = """category,val,sensitivity_level
x,20,A
x,31,B
x,60,C
x,20,A
x,25,B
x,60,C
y,20,A
y,40,B
y,60,C
y,20,A
y,24,B
y,30,C"""

df = pd.read_csv(StringIO(a))

def grp_1evel_1(x):
    """ 
    return if all the elements are less than threshold
    """

    return x<=30

def grp_1evel_0(x):
"""
Input: data grouped by category. Here I want to go through this categories, in an descending order, 
that is C, B and then A, and wherever one of this categories has x<=30 valid for all elements select 
that category as best category. Think about a device sensitivity, that at the highest sensitivity the 
data maybe garbage, so you would like to move down the sensitivity and check again.
"""


    x['islessthan30'] = x.groupby('sensitivity_level').transform(grp_1evel_1)
    return x

print df.groupby('category').apply(grp_1evel_0)

但不幸的是，以上代码无法生成此矩阵，因为-我无法对groupby进行降序排序-我无法为groupby的groupby分配值

：

x,20,A,1
x,31,B,0
x,60,C,0
x,20,A,1
x,25,B,0
x,60,C,0
y,20,A,0
y,29,B,1
y,60,C,0
y,20,A,0
y,24,B,1
y,30,C,0

有什么提示吗？

算法应如下

在类别中，如果所有值均小于阈值，则从最高灵敏度开始，然后将此灵敏度设置为1，然后跳过其他较低的灵敏度。

Answer 1

我认为您正在寻找这样的东西：

In [28]: df
Out[28]: 
   category  val sensitivity_level
0         x   20                 A
1         x   31                 B
2         x   60                 C
3         x   20                 A
4         x   25                 B
5         x   60                 C
6         y   20                 A
7         y   40                 B
8         y   60                 C
9         y   20                 A
10        y   24                 B
11        y   30                 C

In [29]: 

In [29]: res = df.groupby(['category', 'sensitivity_level']).max()

In [30]: res
Out[30]: 
                            val
category sensitivity_level     
x        A                   20
         B                   31
         C                   60
y        A                   20
         B                   40
         C                   60

In [31]: res[res.val <= 30]
Out[31]: 
                            val
category sensitivity_level     
x        A                   20
y        A                   20

因此，您可以根据类别和敏感度级别进行分组。 最后一行给出每个类别所需的敏感度级别。 这样可以避免创建一个中间列，该列说明每个级别是否小于30。

假设一个x=31实际上是20：

In [33]: df.val.iloc[1] = 20

In [34]: df
Out[34]: 
   category  val sensitivity_level
0         x   20                 A
1         x   20                 B
2         x   60                 C
3         x   20                 A
4         x   25                 B
5         x   60                 C
6         y   20                 A
7         y   40                 B
8         y   60                 C
9         y   20                 A
10        y   24                 B
11        y   30                 C

然后，我们期望x使用B，而y仍然使用A。我们可以对最后一步进行一些修改：

In [51]: res = df.groupby(['category', 'sensitivity_level']).max()
In [48]: x = res[res.val <= 30]

In [49]: 

In [49]: x
Out[49]: 
                            val
category sensitivity_level     
x        A                   20
         B                   25
y        A                   20

In [71]: x.reset_index('category').sort_index(ascending=False).groupby(level='sensitivity_level').first()
Out[71]: 
                  category  val
sensitivity_level              
A                        y   20
B                        x   25

进行最后一步可能是更好的方法。

为熊猫中的groupby的组分配值

问题描述

1 个解决方案

解决方案1
4 已采纳 2013-11-05 18:04:53

为熊猫中的groupby的组分配值

问题描述

1 个解决方案

解决方案1 4 已采纳 2013-11-05 18:04:53

解决方案1
4 已采纳 2013-11-05 18:04:53