简体   繁体   中英

Pandas groupby count values above threshold

I have a groupby question that I can't solve. It is probably simple, but I can't get it to work nicely. I am trying to compute some statistics on a variable with pandas groupby chained with the very handy agg function. I would like add to the list below a calculation of the number of values above a given threshold.

df = df.groupby(['scenario','Name','year','month'])["Value"].agg([np.min,np.max,np.mean,np.std]) 

Usually, I compute the number of values above a given threshold as shown below, but I can't find a way to add this to the aggregation function. Do you know how I could do that?

df =df[df>0].groupby(['scenario','Name','year','month']).count()

Your answer works. Else you could add it to the one line, not needing to create a separate function by using lambda x: instead.

df = df.groupby(["scenario", "Name", "year", "month"])["Value"].agg([np.min, np.max, np.mean, np.std, lambda x: ((x > 0)*1).sum()])

The logic here: (x > 0) returns True/False bool; *1 turns the bool to an integer (1 = True, 0 = False); .sum() will sum all the 1s and 0s within the group - and as those that are True = 1, the sum will count all values greater than 0.

Running a quick test on the time taken, your solution is faster, but I thought I would give an alternative solution anyway.

I found a solution by creating a function and passing it in the agg function.

def counta(x):
    m = np.count_nonzero(x > 10)
    return m 

df = df.groupby(['scenario','Name','year','month'])["Value"].agg([np.min,np.max,np.mean,np.std,counta])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM