简体繁体 English

AWS Glue PySpark无法计算记录

[英]AWS Glue PySpark can't count the records

原文 2018-05-05 20:20:10 3 1 amazon-web-services/ apache-spark/ pyspark/ aws-glue

I'm using AWS Glue to extract data from EC2 (Postgre) to be transformed and put it on S3 when I tried to extract 1 table. 我正在使用AWS Glue从EC2（Postgre）中提取要转换的数据，并在尝试提取1个表时将其放在S3上。 I got an error looks like this: 我收到一个错误，看起来像这样：

Is there anything I can do? 有什么我可以做的吗？ I tried to drop null fields or fillna, but none of those works. 我试图删除空字段或fillna，但是这些都不起作用。

UPDATE: I even selected a string-type column but still got the same error: 更新：我什至选择了一个字符串类型的列，但仍然遇到相同的错误：

1 个解决方案

Can you try, df.isnull().any() or df.isnull().sum() . df.isnull().any()或df.isnull().sum()是否可以尝试？ This should help us see the columns with invalid NaN data. 这应该有助于我们查看包含无效NaN数据的列。 Also please try to fetch count of records with df.count(dropna = False) / df.na.drop() . 另外，请尝试使用df.count(dropna = False) / df.na.drop()获取记录数。 Please refer here , where its explained more in detail on handling null column data. 请参考此处，其中更详细地解释了如何处理空列数据。

Hope this helps. 希望这可以帮助。

您能否在 AWS Glue 中使用 PySpark 而不是 Glue PySpark？ - Can you use PySpark instead of Glue PySpark in AWS Glue?

AWS Glue流程可以按行记录吗 - Can AWS Glue process records row wise

使用AWS Glue或PySpark过滤DynamicFrame - Filtering DynamicFrame with AWS Glue or PySpark

更改 AWS Glue Pyspark 中的分隔符 - Change the delimiter in AWS Glue Pyspark

AWS EMR Spark Glue PySpark - - AWS EMR Spark Glue PySpark -

AWS 胶水作业 (Pyspark) 到 AWS 胶水数据目录 - AWS glue job (Pyspark) to AWS glue data catalog

TypeError：“ JavaPackage”对象在PySpark，AWS Glue上不可调用 - TypeError: 'JavaPackage' object is not callable on PySpark, AWS Glue

如何从 AWS Glue (PySpark) 连接到 Redshift？ - How to connect to Redshift from AWS Glue (PySpark)?

在AWS Glue pySpark脚本中使用SQL - use SQL inside AWS Glue pySpark script

如何调试 aws glue pyspark 作业 - How to debug an aws glue pyspark job

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 您能否在 AWS Glue 中使用 PySpark 而不是 Glue PySpark？ - Can you use PySpark instead of Glue PySpark in AWS Glue? AWS Glue流程可以按行记录吗 - Can AWS Glue process records row wise 使用AWS Glue或PySpark过滤DynamicFrame - Filtering DynamicFrame with AWS Glue or PySpark 更改 AWS Glue Pyspark 中的分隔符 - Change the delimiter in AWS Glue Pyspark AWS EMR Spark Glue PySpark - - AWS EMR Spark Glue PySpark - AWS 胶水作业 (Pyspark) 到 AWS 胶水数据目录 - AWS glue job (Pyspark) to AWS glue data catalog TypeError：“ JavaPackage”对象在PySpark，AWS Glue上不可调用 - TypeError: 'JavaPackage' object is not callable on PySpark, AWS Glue 如何从 AWS Glue (PySpark) 连接到 Redshift？ - How to connect to Redshift from AWS Glue (PySpark)? 在AWS Glue pySpark脚本中使用SQL - use SQL inside AWS Glue pySpark script 如何调试 aws glue pyspark 作业 - How to debug an aws glue pyspark job

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM