简体   繁体   中英

Assign values to the groups of a groupby in pandas

I would like to select my raw data based on the validity of their ranges. There is an instrument that the most sensitive setting is C, and then B and then A. So start with C, see if all the values are less than threshold, if yes, then perfect, set all the data in this sensitivity to best=1.

from StringIO import StringIO

a = """category,val,sensitivity_level
x,20,A
x,31,B
x,60,C
x,20,A
x,25,B
x,60,C
y,20,A
y,40,B
y,60,C
y,20,A
y,24,B
y,30,C"""

df = pd.read_csv(StringIO(a))

def grp_1evel_1(x):
    """ 
    return if all the elements are less than threshold
    """

    return x<=30

def grp_1evel_0(x):
"""
Input: data grouped by category. Here I want to go through this categories, in an descending order, 
that is C, B and then A, and wherever one of this categories has x<=30 valid for all elements select 
that category as best category. Think about a device sensitivity, that at the highest sensitivity the 
data maybe garbage, so you would like to move down the sensitivity and check again.
"""


    x['islessthan30'] = x.groupby('sensitivity_level').transform(grp_1evel_1)
    return x

print df.groupby('category').apply(grp_1evel_0)

But unfortunately the above code does not produce this matrix, since - I can not sort a groupby descending - I cannot assign values to a groupby of a groupby

:

x,20,A,1
x,31,B,0
x,60,C,0
x,20,A,1
x,25,B,0
x,60,C,0
y,20,A,0
y,29,B,1
y,60,C,0
y,20,A,0
y,24,B,1
y,30,C,0

Any hints?

The algorithm should be as follow

In a category, start with the highest sensitivity, if all values less than threshold then set this sensitivity to 1, and skip the other lower sensitivities.

I think you're looking for something like this:

In [28]: df
Out[28]: 
   category  val sensitivity_level
0         x   20                 A
1         x   31                 B
2         x   60                 C
3         x   20                 A
4         x   25                 B
5         x   60                 C
6         y   20                 A
7         y   40                 B
8         y   60                 C
9         y   20                 A
10        y   24                 B
11        y   30                 C

In [29]: 

In [29]: res = df.groupby(['category', 'sensitivity_level']).max()

In [30]: res
Out[30]: 
                            val
category sensitivity_level     
x        A                   20
         B                   31
         C                   60
y        A                   20
         B                   40
         C                   60

In [31]: res[res.val <= 30]
Out[31]: 
                            val
category sensitivity_level     
x        A                   20
y        A                   20

So you groupby category and sensitivity level. The last line give the required sensitivity level for each category. This way avoids creating an intermediate column that says whether or not each level is less than 30.

Suppose that the one x=31 was actually 20:

In [33]: df.val.iloc[1] = 20

In [34]: df
Out[34]: 
   category  val sensitivity_level
0         x   20                 A
1         x   20                 B
2         x   60                 C
3         x   20                 A
4         x   25                 B
5         x   60                 C
6         y   20                 A
7         y   40                 B
8         y   60                 C
9         y   20                 A
10        y   24                 B
11        y   30                 C

Then we'd expect x to use B and y to still use A. We can amend the last step a bit:

In [51]: res = df.groupby(['category', 'sensitivity_level']).max()
In [48]: x = res[res.val <= 30]

In [49]: 

In [49]: x
Out[49]: 
                            val
category sensitivity_level     
x        A                   20
         B                   25
y        A                   20

In [71]: x.reset_index('category').sort_index(ascending=False).groupby(level='sensitivity_level').first()
Out[71]: 
                  category  val
sensitivity_level              
A                        y   20
B                        x   25

There's probably a better way to do the last step.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM