简体   繁体   中英

How to replace non integer values in a pandas Dataframe?

I have a dataframe consisting of two columns, Age and Salary

Age   Salary
21    25000
22    30000
22    Fresher
23    2,50,000
24    25 LPA
35    400000
45    10,00,000

How to handle outliers in Salary column and replace them with an integer?

If need replace non numeric values use to_numeric with parameter errors='coerce' :

df['new'] = pd.to_numeric(df.Salary.astype(str).str.replace(',',''), errors='coerce')
              .fillna(0)
              .astype(int)
print (df)
   Age     Salary      new
0   21      25000    25000
1   22      30000    30000
2   22    Fresher        0
3   23   2,50,000   250000
4   24     25 LPA        0
5   35     400000   400000
6   45  10,00,000  1000000

使用numpy在哪里找到非数字值,替换为'0'。

df['New']=df.Salary.apply(lambda x: np.where(x.isdigit(),x,'0'))

If you use Python 3 use the following. I am not sure how other Python versions return type(x). However I would not replace missing or inconsistent values with 0, it is better to replace them with None. But let's say you want to replace string values (outliers or inconsistent values) with 0 :

df['Salary']=df['Salary'].apply(lambda x: 0 if str(type(x))=="<class 'str'>" else x)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM