简体   繁体   中英

Pandas replace NaN value by string under conditions

What I am trying to do is to replace the NaN value by a string 'school'. If longitude is within the range of 114.8 to 115.2 and latitude is within the range of 19.8 to 20.2, the NaN value in the location column with be replaced by the string 'School'.

df=
    Date            Longitude Latitude   Location
0   2020-01-01 01:00    115.1   20.0         NaN 
1   2020-01-01 01:01    115.0   20.1         NaN
2   2020-01-01 01:02    114.9   19.9         NaN
3   2020-01-01 01:03    123.1   20.0         NaN
4   2020-01-01 01:04    115.0   18.9         NaN

I would like to convert my DataFrame as follows

df=
    Date            Longitude Latitude   Location
0   2020-01-01 01:00    115.1   20.0      school
1   2020-01-01 01:01    115.0   20.1      school
2   2020-01-01 01:02    114.9   19.9      school
3   2020-01-01 01:03    123.1   20.0       NaN
4   2020-01-01 01:04    115.0   18.9       NaN

What I have tried to do is

df.loc[((df['Longitude']<115.2) & (df['Longitude']>114.8) & (df['Latitude']>19.8) & (df['Latitude']<20.2)), df['Location']]='School'

However, I get an error

KeyError: "None of [Float64Index([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              ...\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],\n             dtype='float64', length=7441)] are in the [columns]"
    

I am not sure why this is happening and thx a lot for reading my question!

You are close:

df.loc[((df['Longitude']<115.2) & (df['Longitude']>114.8) 
         & (df['Latitude']>19.8) & (df['Latitude']<20.2)), 
      'Location']='School'

Also, you can use between :

df.loc[((df['Longitude'].between(114.8, 115.2, inclusive=False) 
         & (df['Latitude'].between(19.8, 20.2, inclusive=False)), 
      'Location']='School'

After the condition in df.loc[] a column name is expected as key and you are passing the series df['Location'] which ends up giving you the key error.

mention only the column name ie 'Location' and it will work.

df.loc[((df['Longitude']<115.2) & (df['Longitude']>114.8) & 
      (df['Latitude']>19.8) & (df['Latitude']<20.2)), 
      'Location'] = 'School'

Try using just numpy where method:

df['location'] = np.where((df['Longitude'].between(114.8, 115.2)) & ((df['Latitude'].between(19.8, 20.2)), "school", np.nan)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM