I have two columns like this:
what i want to do is suppose for 'age' columns value between 30-39,i want to fill the missing value of age_band = 30. Like that suppose for 'age' columns value between 80-89,i want to fill the missing value of age_band = 80. How can i do this in pandas dataframe?
I tried like this but the loop is running like forever
for ages in data['age']:
if 0<=ages<=9:
data['age_band']= data['age_band'].fillna(0)
elif 10<=ages<=19:
data['age_band']= data['age_band'].fillna(10)
elif 20<=ages<=29:
data['age_band']= data['age_band'].fillna(20)
elif 30<=ages<=39:
data['age_band']= data['age_band'].fillna(30)
elif 40<=ages<=49:
data['age_band']= data['age_band'].fillna(40)
elif 50<=ages<=59:
data['age_band']= data['age_band'].fillna(50)
elif 60<=ages<=69:
data['age_band']= data['age_band'].fillna(60)
elif 70<=ages<=79:
data['age_band']= data['age_band'].fillna(70)
elif 80<=ages<=89:
data['age_band']= data['age_band'].fillna(80)
elif 90<=ages<=99:
data['age_band']= data['age_band'].fillna(90)
elif 100<=ages<=109:
data['age_band']= data['age_band'].fillna(100)
please help me
Try this shortcut:
data['age_band'] = data['age_band'].fillna(data['age'] // 10 * 10).astype(int)
print(data)
# Output
age age_band
0 93 90
1 46 40
2 50 50
3 56 50
4 89 80
5 19 10
6 25 20
7 17 10
8 54 50
9 42 40
Setup:
import pandas as pd
import numpy as np
np.random.seed(2022)
data = pd.DataFrame({'age': np.random.randint(1, 111, 10), 'age_band': np.nan})
print(data)
# Output
age age_band
0 93 NaN
1 46 NaN
2 50 NaN
3 56 NaN
4 89 NaN
5 19 NaN
6 25 NaN
7 17 NaN
8 54 NaN
9 42 NaN
The above answers only work when age bins are equal you may try pd.cut which work in all scenario.
You can use labels to pd.cut() as well. The following example contains the age in the range from 0-9. We're adding a new column called 'age alband' to categorize the age
bins represent the intervals: 0-9 is one interval, 10-19 is one interval, and so on The corresponding labels are "0-9"etc
bins = [0, 9,19,29,39,49,59,69,79,89,99,109]
labels = ["0-9","10-19","20-29","30-39","40-49","50-59","60-69","70-79","80-89","90-99","100-109",">109"]
data['age_band']= pd.cut(data['age'], bins=bins, labels=labels)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.