简体   繁体   中英

Pandas count number of occurrences of each value between ranges

I have a dataset where I have age as a continuous variable and I want to county the number of occurrences of 1's and 0's in "Mental Health" for a number of age group ranges, eg 18-25, 26-33, and so on.

A sample code is as below:

df = pd.DataFrame([[18, 1], [45, 1], [56, 0], [26, 0], [35, 1]], columns=['Age', 'Mental_Health'])

What is the easiest way to do this? I don't really want to convert the age into a range if I can avoid it, if I have to I will but I'm ideally looking for something which comes out with 18-25 suffering = 24, not suffering = 21, and so on for all age ranges.

What is the easiest way of doing this?

You want pd.cut . You can define arbitrary bins (I've used range below). This will cut the passed series, and you can count the distinct "cut" ranges to see how many rows fall therein:

df["age_range"] = pd.cut(df.Age, bins=[0,18,25,33,99], right=False)
df2 = df.groupby("age_range").Mental_Health.sum().to_frame(name="suffering")
df2["not_suffering"] = df.groupby("age_range").Mental_Health.count() - df2.suffering
  

output:

           suffering  not_suffering
age_range
[0, 18)            0              0
[18, 25)           1              0
[25, 33)           0              1
[33, 99)           2              1

Try this:

import pandas as pd
import numpy as np
df = pd.DataFrame([[18, 1], [45, 1], [56, 0], [26, 0], [35, 1]], columns=['Age', 'Mental_Health'])

df['cuts'] = pd.cut(df['Age'], np.arange(0,100,15))

df.pivot_table(index='cuts', columns='Mental_Health', values='Age', aggfunc='count').fillna(0).astype(int)

Output:

Mental_Health  0  1
cuts               
(15, 30]       1  1
(30, 45]       0  2
(45, 60]       1  0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM