Fill nan values in one column based on other columns

Question

I am working on a dataset which consists of average age of marriage. On this dataset I am doing data cleaning job. While performing this process, I came across a feature where I had to fill the 'NaN' values in the location column. But in location column there are multiple unique values and I want to fill the nan values in location. I need some suggestion on how to fill these Nan values in column which had many unique values.

I have attached the dataset for reference, DataSet

Answer 1

I suggest doing it in 3 steps:

Fill in the missing values of location with either the most common location or with a separate value "Unknown";
Fill in the missing values of "age_of_marriage" with a mean value of this feature by location;
If there are any missing values of "age_of_marriage" left, fill them in with the average value.

df = pd.read_csv('https://raw.githubusercontent.com/atharva07/Age-of-marriage/main/age_of_marriage_data.csv', sep=',')
df['location'] = df['location'].fillna('Unknown')
df['age_of_marriage'] = df.groupby(['location'])['age_of_marriage'].apply(lambda x: x.fillna(x.median()))
df['age_of_marriage'] = df['age_of_marriage'].fillna(df['age_of_marriage'].mean())

Fill nan values in one column based on other columns

Question

1 answers

solution1
3 2022-04-10 06:20:30

Fill nan values in one column based on other columns

Question

1 answers

solution1 3 2022-04-10 06:20:30

solution1
3 2022-04-10 06:20:30