简体   繁体   中英

CSV file Infinity value issue with AWS Glue job

I have a csv file which I am reading with Pandas and trying to convert NaN and Infinity to 0.0. I have the code which I run locally and get the conversion properly such as:

df = pd.read_csv('test.csv')
print(df['C1'])
df.replace([np.inf, -np.inf], np.nan, inplace=True)
df = df.fillna(0.00)
print(df['C1'])
0    NaN
1    inf
2    NaN
Name: C1, dtype: float64
0    0.0
1    0.0
2    0.0
Name: C1, dtype: float64

Here, the infinity and NaN value is converted properly into 0.0 as can be seen in the output. But when I do the same in AWS Glue Python Shell job, it does not convert the infinity value to 0.0. The code and output for Glue job is as below:

df = pd.read_csv('s3://bucket/test.csv')
print(df['C1'])
df = df.replace([np.Infinity, -np.Infinity], np.nan)
df = df.fillna(0.00)
print(df['C1'])
0         NaN
1    Infinity
2         NaN
Name: C1, dtype: object
0           0
1    Infinity
2           0
Name: C1, dtype: object

The same file is being used locally and on S3, but the issue is with infinity value. Also, locally, the data types are read as float64, but object type in Glue. Any help around this?

I was able to resolve it based on BdR response in the comments so here is the answer:

df = pd.read_csv(input_path, na_values=["Infinity", "-Infinity"])
df = df.replace([np.Infinity, -np.Infinity], np.nan)
df = df.fillna(0.00)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM