简体   繁体   中英

How to find the median for dataframe of population with columns of age and count?

df looks like this:

   age  population
0   20           2
1   21           3
2   22           2
3   23           5
4   24           7

df = pd.DataFrame({ 'age': [20, 21, 22, 23, 24], 'population': [2, 3, 2, 5, 7]})

and I'd like to calculate the median age of the total population. Is there a simple way to do this?

Got average like this, but I need the median:

df['years'] = df['age'] * df['population']
average_age= (df['years'].sum()/df['population'].sum())

Multiplying two pandas Series is different than multiplying lists - you're not copying each value N times, you're performing element-wise multiplication.

Use pd.Series.repeat to repeat each element N times, and then use the .median method to calculate the median of the resulting pandas Series:

df = pd.DataFrame({ 'age': [20, 21, 22, 23, 24], 'population': [2, 3, 2, 5, 7]})
m = df['age'].repeat(df['population']).median()
print(m)  # output: 23.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM