简体   繁体   中英

Python: Replace the special character by NULL in each column in pandas dataframe

I have a dataframe as follows:

ID      Date_Loading          Date_delivery       Value
001     01.11.2017             20.11.2017         200.34
002     %^&**##_               15.01.2018         300.05
003     11.12.2018             _%67*              7*7%

As we can see that except ID column I have special character in all columns.

Objective: To replace those special character by None . So the final dataframe should look like:

ID      Date_Loading          Date_delivery       Value
001     01.11.2017             20.11.2017         200.34
002     Null                   15.01.2018         300.05
003     11.12.2018             Null               Null

Then as a next step I want parse the Date columns to YYYY-MM-DD format.

In order to accomplish this I am using the following code snippet:

for c in df.columns.tolist():
  df[c] = df[c].astype(str).str.replace(r"[^A-Za-z0-9]"," ")
df['Date_Loading'] = pd.to_datetime(df['Date_Loading'],error='coerce',format='YYYY-MM-DD')
df['Date_delivery'] = pd.to_datetime(df['Date_Loading'],error='coerce',format='YYYY-MM-DD')

But the above code is just not working!!! Even if I am trying to replace, it is not working.

Am I missing out anything?

PS: I have tried in SO - > this and this but so far no luck

You can specify fomrat of datetimes of input data, here DD.MM.YYYY by '%d.%m.%Y' and for convert numbers use to_numeric :

 #for processing all columns
 df = df.astype(str).replace(r"[^A-Za-z0-9]","", regex=True)

df['Date_Loading'] = pd.to_datetime(df['Date_Loading'],errors='coerce',format='%d.%m.%Y')
df['Date_delivery'] = pd.to_datetime(df['Date_delivery'],errors='coerce',format='%d.%m.%Y')

df['Value'] = pd.to_numeric(df['Value'],errors='coerce')
print (df)
   ID Date_Loading Date_delivery   Value
0   1   2017-11-01    2017-11-20  200.34
1   2          NaT    2018-01-15  300.05
2   3   2018-12-11           NaT     NaN

print (df.dtypes)
ID                        int64
Date_Loading     datetime64[ns]
Date_delivery    datetime64[ns]
Value                   float64
dtype: object

EDIT:

dateparse = lambda x: pd.to_datetime(x, format='%d.%m.%Y', errors='coerce',)

df = pd.read_csv(file, parse_dates=['Date_Loading','Date_delivery'], date_parser=dateparse)
    
print (df)
   ID Date_Loading Date_delivery   Value
0   1   2017-11-01    2017-11-20  200.34
1   2          NaT    2018-01-15  300.05
2   3   2018-12-11           NaT    7*7%

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM