简体   繁体   中英

bin counts in stacked histogram (weighted) with x-coordinate greater than certain value

Lets say I have two datasets, and then I plot a stacked histograms of both datasets with some weight. Now, can I know what is the total bin counts for data elements greater than certain number (ie for x-coordinate greater than a certain value). To illustrate my question, I have done the following

import matplotlib.pyplot as plt
import numpy as np

data1 = np.random.normal(0,0.6,1000)
data2 = np.random.normal(0,1.4,1000)

weight1 = np.array([0.5]*len(data1))
weight2 = np.array([0.9]*len(data2))

hist = plt.hist((data1,data2),weights=(weight1,weight2),stacked=True,range=(-5,5))

plt.show()

在此处输入图片说明

Now, how would I know the the bin counts, say for x greater than -2?

As of now, to get that answer, I was doing the following

n1,_,_ = plt.hist((data1,data2),weights=(weight1,weight2),stacked=False,range=(-2,10000))
bin_counts=sum(sum(n1))
print(bin_counts)

Here, I choose the max value in range to be some crazily large number, so that I get all the bin counts for x=-2 and greater.

Is there any more efficient way than this?

Also, what would be the way to obtain the bin_counts for a variable x , where x varies from the minimum value of x-coordinate to maximum value of x-coordinate in some steps?

Any help will be greatly appreciated!

Thanks much!

You could do the following:

#in your case n is going to be a list of arrays, because you have 2 histograms
n,bins,_ = plt.hist(...)
#get a list of lists of counts for bin values over x
n_over_x = [[val for val,bin in zip(selected_cnt, bins) if bin > x] for selected_cnt in n]
#sum up list of lists
result = sum([sum(part_list) for part_list in n_over_x])

here's what I came up with,

def my_range(start, end, step):
    while start <= end:
        yield start
        start += step

b_counts=[0]*len(data1) #here b_counts is the normalized events (i mean normalized according to the weights)
value=[0]*len(data1)
bin_min=-5
bin_max=10
bin_step=1
count_max = (bin_max-bin_min)/bin_step

for i in my_range(bin_min,count_max,1):
    n1,_,_ = plt.hist((data1,data2),weights=(weight1,weight2),stacked=False,range=(i*bin_step,10000))
    b_counts[i] = sum(sum(n1))
    value[i] = i*bin_step #here value is exactly equal to "i", but I am writing this for a general case
    print(b_counts[i],value[I])

I do believe that this gives me the events (in the histogram) in the range (value,10000), where the value is the variable

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM