简体   繁体   中英

Select rows from a DataFrame based on presence of null value in specific column or columns

I have an imported xls file as pandas dataframe, there are two columns containing coordinates which i will use to merge the dataframe with others which have geolocation data. df.info() shows 8859 records, the coordinatess columns have '8835 non-null float64' records.

I want to eyeball the 24 rows (that i assume are null) with all columns records to see if one of the other columns (street address town) can't be used to manually add back the coordinates for those 24 records. Ie. return dataframe for column in df.['Easting'] where isnull or NaN

I have adapted the method given here as below;

df.loc[df['Easting'] == NaN]

But get back an empty dataframe (0 rows × 24 columns), which makes no sense (to me). Attempting to use Null or Non null doesn't work as these values aren't defined. What am i missing?

I think you need isnull for checking NaN values with boolean indexing :

df[df['Easting'].isnull()]

Docs :

Warning

One has to be mindful that in python (and numpy), the nan's don't compare equal, but None's do. Note that Pandas/numpy uses the fact that np.nan != np.nan, and treats None like np.nan.

In [11]: None == None
Out[11]: True

In [12]: np.nan == np.nan
Out[12]: False

So as compared to above, a scalar equality comparison versus a None/np.nan doesn't provide useful information.

In [13]: df2['one'] == np.nan
Out[13]: 
a    False
b    False
c    False
d    False
e    False
f    False
g    False
h    False
Name: one, dtype: bool

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM