Pandas calculate number of values between each range

Question

I want to find counts of my data between certain custom ranges.

Say I have some data:

import random

my_randoms = random.sample(xrange(100), 10)        
test = pd.DataFrame(my_randoms,columns = ["x"])

How can I produce a data frame that shows the number of values between different ranges? For example, say I want to see how many values occur between 0-19, 20-39, 40-59, 60-79, 80-100. The output dataframe will have one column with those ranges, another with the counts.

I can think of some ugly approaches that involve use of .apply to get a new column list saying which value they are between (and then doing a groupby), but I suspect pandas has a cleaner way lurking about.

Answer 1

Per Jarad与其他问题的链接：

test.groupby(pd.cut(test['x'], np.arange(0,100,20))).count()

Answer 2

there's probably a better way. I'm only new to pandas myself but how about this for the moment:

test.query(test.x.isin(range(20)))

Answer 3

pandas and numpy allow boolean index , is this an ugly approach?

ranges = [ (0,19), (20, 39), (40, 69) ...]
cnt = []
for range in ranges:
    tmp = ranges[(ranges['x'] > range[0]) & (range['x'] <= range[1]) ]
    cnt.append( len(tmp) )

Answer 4

You can use the numpy.histrogram function.

import numpy as np
series = [0, 20, 40, ...]
count, bin_edge = np.histogram( bins = series )

According to numpy.histogram , if bins is a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths.

Pandas calculate number of values between each range

Question

4 answers

solution1
7 ACCPTED 2016-01-27 21:02:46

solution2
2 2016-01-27 20:48:54

solution3
1 2016-01-27 20:58:22

solution4
-1 2016-01-27 21:02:54

Pandas calculate number of values between each range

Question

4 answers

solution1 7 ACCPTED 2016-01-27 21:02:46

solution2 2 2016-01-27 20:48:54

solution3 1 2016-01-27 20:58:22

solution4 -1 2016-01-27 21:02:54

solution1
7 ACCPTED 2016-01-27 21:02:46

solution2
2 2016-01-27 20:48:54

solution3
1 2016-01-27 20:58:22

solution4
-1 2016-01-27 21:02:54