Pandas groupby count values above threshold

Question

I have a groupby question that I can't solve. It is probably simple, but I can't get it to work nicely. I am trying to compute some statistics on a variable with pandas groupby chained with the very handy agg function. I would like add to the list below a calculation of the number of values above a given threshold.

df = df.groupby(['scenario','Name','year','month'])["Value"].agg([np.min,np.max,np.mean,np.std])

Usually, I compute the number of values above a given threshold as shown below, but I can't find a way to add this to the aggregation function. Do you know how I could do that?

df =df[df>0].groupby(['scenario','Name','year','month']).count()

Answer 1

Your answer works. Else you could add it to the one line, not needing to create a separate function by using lambda x: instead.

df = df.groupby(["scenario", "Name", "year", "month"])["Value"].agg([np.min, np.max, np.mean, np.std, lambda x: ((x > 0)*1).sum()])

The logic here: (x > 0) returns True/False bool; *1 turns the bool to an integer (1 = True, 0 = False); .sum() will sum all the 1s and 0s within the group - and as those that are True = 1, the sum will count all values greater than 0.

Running a quick test on the time taken, your solution is faster, but I thought I would give an alternative solution anyway.

Answer 2

I found a solution by creating a function and passing it in the agg function.

def counta(x):
    m = np.count_nonzero(x > 10)
    return m 

df = df.groupby(['scenario','Name','year','month'])["Value"].agg([np.min,np.max,np.mean,np.std,counta])

Pandas groupby count values above threshold

Question

2 answers

solution1
1 ACCPTED 2022-04-06 17:00:46

solution2
0 2022-04-04 18:04:50

Pandas groupby count values above threshold

Question

2 answers

solution1 1 ACCPTED 2022-04-06 17:00:46

solution2 0 2022-04-04 18:04:50

solution1
1 ACCPTED 2022-04-06 17:00:46

solution2
0 2022-04-04 18:04:50