简体   繁体   中英

AWS Glue PySpark can't count the records

I'm using AWS Glue to extract data from EC2 (Postgre) to be transformed and put it on S3 when I tried to extract 1 table. I got an error looks like this:

在此处输入图片说明

Is there anything I can do? I tried to drop null fields or fillna, but none of those works.

UPDATE: I even selected a string-type column but still got the same error: 在此处输入图片说明

Can you try, df.isnull().any() or df.isnull().sum() . This should help us see the columns with invalid NaN data. Also please try to fetch count of records with df.count(dropna = False) / df.na.drop() . Please refer here , where its explained more in detail on handling null column data.

Hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM