简体   繁体   中英

How to fill nan values in a data frame's column with a value from the same column when both share the equal value in another column? Ex: Where clause

I am a very beginner in python and need your help with my problem here. I have a dataset regarding coronavirus mortality. There are 2 columns Neighborhood Name (Column Name: Neighbourhood Name)which based on Postal Code Column (Column Name: NFS, and The postal code column which filled based on the Neighborhood Name column.

I am trying to fill the Nan values in both columns.

Here What I tried to do.

1 - getting the data into jupyter

 covid_df.head(5)

Output is

covid_df.isnull().sum().to_frame()

Null Values

covid_sub_df = covid_df.loc[:, ['Neighbourhood Name', 'FSA']]
covid_sub_df

covid_sub_df_2 = covid_sub_df.drop_duplicates()
covid_sub_df_2

Now I tried This

val = ""
for i, j in covid_df['Neighbourhood Name'], covid_df['FSA']:
    for k,l in covid_sub_df_2['Neighbourhood Name'], covid_sub_df_2['FSA']:
        if k == val and j == l:
            covid_df['Neighbourhood Name'] = covid_sub_df['Neighbourhood Name']
        if j == val and k == i:
            covid_df['FSA'] = covid_sub_df['FSA']

I get this error:

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) in 1 val = "" ----> 2 for i, j in covid_df['Neighbourhood Name'], covid_df['FSA']: 3 for k,l in covid_sub_df_2['Neighbourhood Name'], covid_sub_df_2['FSA']: 4 if k == val and j == l: 5 covid_df['Neighbourhood Name'] = covid_sub_df['Neighbourhood Name']

ValueError: too many values to unpack (expected 2)

Thank You all

So what you need to do is get rid of the following error?

ValueError: too many values to unpack (expected 2)

The question isn't posed very specifically because the title is how to fill nan values. Also, you should try and provide a dummy data if possible

However, assuming you want to get rid of the error, it is possible you wanted to simultaneously loop over the variables. There is a function called as zip() that does that. So the following modification should hopefully work:

val = ""
for i, j in zip(covid_df['Neighbourhood Name'], covid_df['FSA']):
    for k,l in zip(covid_sub_df_2['Neighbourhood Name'], covid_sub_df_2['FSA']):
        if k == val and j == l:
            covid_df['Neighbourhood Name'] = covid_sub_df['Neighbourhood Name']
        if j == val and k == i:
            covid_df['FSA'] = covid_sub_df['FSA']

It is not clear which values you want to fill your Nan values with. One option is to use pandas DataFrame replace method:

covid_df.replace({np.nan : new_value})

replaces every nan value with that new_value. This works beacause pandas is built on top of numpy, a famous python library, and saves every Nan value as a np.nan. You should import numpy for this to work previously:

import numpy as np

Be aware that every Nan value will be replaced with the same exact value in the new_value variable.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM