简体   繁体   中英

Select rows with specific values in columns and include rows with NaN in pandas dataframe

I have a DataFrame df that looks something like this:

df
   a         b         c
0  0.557894 -0.196294 -0.020490
1  1.138774 -0.699224       NaN
2       NaN  2.384483  0.554292
3 -0.069319       NaN  1.162941
4  1.040089 -0.271777       NaN
5 -0.337374       NaN -0.771888
6 -1.813278 -1.564666       NaN
7       NaN       NaN       NaN
8  0.737413       NaN  0.679575
9 -2.345448  2.443669 -1.409422

I want to select the rows that have a value over some value, which I would normally do using:

new_df = df[df['c'] >= .5]

but that will return:

          a         b         c
2       NaN  2.384483  0.554292
3 -0.069319       NaN  1.162941
5 -0.337374       NaN  0.771888
8  0.737413       NaN  0.679575

I want to get those rows, but also keep the rows that have nan values in column 'c' . I haven't been able to find a question asking the same thing, they usually ask for one or the other, but not both. I can hard code the rows that I want to drop since I know the specific values, but I was wondering if there is a better solution. The end result should look something like this:

   a         b         c
1  1.138774 -0.699224       NaN
2       NaN  2.384483  0.554292
3 -0.069319       NaN  1.162941
4  1.040089 -0.271777       NaN
6 -1.813278 -1.564666       NaN
7       NaN       NaN       NaN
8  0.737413       NaN  0.679575

Only dropping rows 0,5 and 9 since they are less than .5 in columns 'c'

You should use the | (or) operator.

import pandas as pd
import numpy as np

df = pd.DataFrame({'a': [0.557894,1.138774,np.nan,-0.069319,1.040089,-0.337374,-1.813278,np.nan,0.737413,-2.345448],
                   'b': [-0.196294,-0.699224,2.384483,np.nan,-0.271777,np.nan,-1.564666,np.nan,np.nan,2.443669],
                   'c': [-0.020490,np.nan,0.554292,1.162941,np.nan,-0.771888,np.nan,np.nan,0.679575,-1.409422]})

df = df[(df['c'] >= .5) | (df['c'].isnull())]
print(df)

Output:

           a            b          c
1   1.138774    -0.699224        NaN
2        NaN     2.384483   0.554292
3  -0.069319          NaN   1.162941
4   1.040089    -0.271777        NaN
6  -1.813278    -1.564666        NaN
7   NaN               NaN        NaN
8   0.737413          NaN   0.679575

你应该能够做到这一点

new_df = df[df['c'] >=5 or df['c'] == 'NaN']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM