简体   繁体   中英

calculate the median age for each region from frequency table with python

I have a dataframe that is similar to:

在此处输入图片说明

I would like to calculate the median age for each city but given that it is a frequency table I'm finding it somewhat tricky. Is there any function in pandas or other that would help me achieve this?

For each row, find the number of instances there are. Then take that number, divide by 2, and determine what age that would be by checking if the number of people have the age smaller than what we are looking for.

For example, for the row 'alabama', you would add 34 + 67 + ... + 23 = 5463. That, divided by 2, would be 2731.5 ==> 2731. Then, checking each age group, determine where the 2731th person would be.

  • At age 1, since 2731 > 34, check the next.
  • At age 2, since 2731 > 34 + 67, check the next.
  • At age 3, since 2731 > 34 + 67 + 89, check the next.
  • At age 4, since 2731 > 34 + 67 + 89 + 89, check the next.
  • At age 5, since 2731 > 34 + 67 + 89 + 89 + 67, check the next.
  • At age 6, since 2731 > 34 + 67 + 89 + 89 + 67 + 545, check the next.
  • At age 7, since 2731 < 34 + 67 + 89 + 89 + 67 + 545 + 4546, the median has to be in this age group.

Do this repeatedly for each city/state, and you should get the median for each one.

Maybe this works for you:

import numpy as np
import pandas as pd

# create dataframe
df = pd.DataFrame(
    [
        ['Alabama', 34, 67, 89, 89, 67, 545, 4546, 3, 23], 
        ['Georgia', 345, 65, 67, 32, 23, 567, 87, 647, 68]
    ],
    columns=['City', 0, 1, 2, 3, 4, 5, 6, 7, 8]
).set_index('City')
print(df)

# calculate median for freq table
m = list() # median list
for index, row in df.iterrows():
    v = list() # value list
    z = zip(row.index, row.values)
    for item in z:
        for f in range(item[1]):
            v.append(item[0])
    m.append(np.median(v))
df_m = pd.DataFrame({'City': df.index, 'Median': m})
print(df_m)

Input:

           0   1   2   3   4    5     6    7   8
City                                            
Alabama   34  67  89  89  67  545  4546    3  23
Georgia  345  65  67  32  23  567    87  647  68

Output:

      City  Median
0  Alabama     6.0
1  Georgia     5.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM