For each row, find the number of instances there are. Then take that number, divide by 2, and determine what age that would be by checking if the number of people have the age smaller than what we are looking for.
For example, for the row 'alabama', you would add 34 + 67 + ... + 23 = 5463. That, divided by 2, would be 2731.5 ==> 2731. Then, checking each age group, determine where the 2731th person would be.
Do this repeatedly for each city/state, and you should get the median for each one.
Maybe this works for you:
import numpy as np
import pandas as pd
# create dataframe
df = pd.DataFrame(
[
['Alabama', 34, 67, 89, 89, 67, 545, 4546, 3, 23],
['Georgia', 345, 65, 67, 32, 23, 567, 87, 647, 68]
],
columns=['City', 0, 1, 2, 3, 4, 5, 6, 7, 8]
).set_index('City')
print(df)
# calculate median for freq table
m = list() # median list
for index, row in df.iterrows():
v = list() # value list
z = zip(row.index, row.values)
for item in z:
for f in range(item[1]):
v.append(item[0])
m.append(np.median(v))
df_m = pd.DataFrame({'City': df.index, 'Median': m})
print(df_m)
Input:
0 1 2 3 4 5 6 7 8
City
Alabama 34 67 89 89 67 545 4546 3 23
Georgia 345 65 67 32 23 567 87 647 68
Output:
City Median
0 Alabama 6.0
1 Georgia 5.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.