简体   繁体   中英

Pandas ignore non-numeric values

I have this df:

     X
0   13500
1   13600
2   BBOX-001
3   Mobi-1
4   15003
5   15004

I am trying to enter a new column. if x >15000 then the value is A, otherwise B. If X is non-numeric (BBOX-001, Mobi-1), then it should display the value in column X:

     X        Y
0   13500     B
1   13600     B
2   BBOX-001  BBOX-001
3   Mobi-1    Mobi-1
4   15003     A
5   15004     A

I have this below but how do I ignore the non-numeric values in column X?

df['Y'] = np.where(df['X'] > 15000, 'A', 'B')

When df['X'] contains a mix of numbers and strings, the dtype of the column will be object instead of a numeric dtype. The number-like items in df['X'] may be ints or floats or maybe even strings (it's unclear from your question). Many numeric operations such as df['X'] > 15000 may raise errors in this case.

To treat the number-like values as numbers, use pd.to_numeric to convert the column into a numeric dtype:

In [41]: numeric_X = pd.to_numeric(df['X'], errors='coerce')
In [43]: numeric_X
Out[43]: 
0    13500.0
1    13600.0
2        NaN
3        NaN
4    15003.0
5    15004.0
Name: X, dtype: float64

And you can also identify the string-like values by testing for NaNs:

is_stringlike = np.isnan(numeric_X)

import numpy as np
import pandas as pd

df = pd.DataFrame({'X': ['13500', '13600', 'BBOX-001', 'Mobi-1', '15003', '15004']})

numeric_X = pd.to_numeric(df['X'], errors='coerce')
is_stringlike = np.isnan(numeric_X)
conditions = [numeric_X > 15000, is_stringlike]
choices = ['A', df['X']]
df['Y'] = (np.select(conditions, choices, default='B'))
print(df)

yields

          X         Y
0     13500         B
1     13600         B
2  BBOX-001  BBOX-001
3    Mobi-1    Mobi-1
4     15003         A
5     15004         A

You may achieve your goal with convert_objects :

import pandas as pd
import numpy as np

df = pd.DataFrame({'X': ['13500', '13600', 'BBOX-001', 'Mobi-1', '15003', '15004']})
# Convert only numeric value to put it in comparison
df['Y'] = np.where(df.X.convert_objects(convert_numeric=True) > 15000, 'A', 'B')

print (df)

Output:

         X  Y
0     13500  B
1     13600  B
2  BBOX-001  B
3    Mobi-1  B
4     15003  A
5     15004  A

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM