I have a groupby question that I can't solve. It is probably simple, but I can't get it to work nicely. I am trying to compute some statistics on a variable with pandas groupby chained with the very handy agg function. I would like add to the list below a calculation of the number of values above a given threshold.
df = df.groupby(['scenario','Name','year','month'])["Value"].agg([np.min,np.max,np.mean,np.std])
Usually, I compute the number of values above a given threshold as shown below, but I can't find a way to add this to the aggregation function. Do you know how I could do that?
df =df[df>0].groupby(['scenario','Name','year','month']).count()
Your answer works. Else you could add it to the one line, not needing to create a separate function by using lambda x:
instead.
df = df.groupby(["scenario", "Name", "year", "month"])["Value"].agg([np.min, np.max, np.mean, np.std, lambda x: ((x > 0)*1).sum()])
The logic here: (x > 0)
returns True/False bool; *1
turns the bool to an integer (1 = True, 0 = False); .sum()
will sum all the 1s and 0s within the group - and as those that are True = 1, the sum will count all values greater than 0.
Running a quick test on the time taken, your solution is faster, but I thought I would give an alternative solution anyway.
I found a solution by creating a function and passing it in the agg function.
def counta(x):
m = np.count_nonzero(x > 10)
return m
df = df.groupby(['scenario','Name','year','month'])["Value"].agg([np.min,np.max,np.mean,np.std,counta])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.