简体   繁体   English

TypeError:“ JavaPackage”对象在PySpark,AWS Glue上不可调用

[英]TypeError: 'JavaPackage' object is not callable on PySpark, AWS Glue

I've learned Spark in Scala but I'm very new to pySpark and AWS Glue, 我已经在Scala中学习了Spark,但对pySpark和AWS Glue还是很陌生,
so I followed this official tutorial by AWS. 因此我遵循了AWS的官方教程。
https://docs.aws.amazon.com/ja_jp/glue/latest/dg/aws-glue-programming-python-samples-legislators.html https://docs.aws.amazon.com/ja_jp/glue/latest/dg/aws-glue-programming-python-samples-legislators.html

I successfully created development endpoint, 我成功创建了开发端点,
connected to pyspark REPL via ssh and typed in these commands: 通过ssh连接到pyspark REPL并输入以下命令:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

glueContext = GlueContext(SparkContext.getOrCreate())

But on the last line, I got 但是在最后一行,我得到了

>>> glueContext = GlueContext(SparkContext.getOrCreate())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/share/aws/glue/etl/python/PyGlue.zip/awsglue/context.py", line 44, in __init__
  File "/usr/share/aws/glue/etl/python/PyGlue.zip/awsglue/context.py", line 64, in _get_glue_scala_context
TypeError: 'JavaPackage' object is not callable

I also tried importing py4j manually, but it just didn't work. 我也尝试过手动导入py4j,但是没有用。

How can I fix this? 我怎样才能解决这个问题?
Any little help will be appreciated. 任何帮助将不胜感激。

Finally I solved by myself. 最后我自己解决了。
Looks like it was Glue/AWS specific issue, not spark or python. 看起来这是Glue / AWS特有的问题,而不是spark或python。

After several trials, I got an error message that says "ListObject" operation has failed when starting Spark(pyspark) REPL. 经过多次试验,启动Spark(pyspark)REPL时收到一条错误消息,提示“ ListObject”操作失败。
ListObject is obviously the name of boto3's API call to access contents on S3. ListObject显然是boto3用于访问S3上的内容的API调用的名称。

So I checked its IAM role which had AWSGlueConsoleFullAccess with some S3Access included in it already, attached AmazonS3FullAccess policy to it, and the error disappeared. 因此,我检查了它的IAM角色,该角色已包含AWSGlueConsoleFullAccess和一些S3Access,并对其附加了AmazonS3FullAccess策略,该错误消失了。
Also, I made another glue-development-endpoint cluster and also there was no error on the new cluster either, even without S3FullAccess. 另外,我创建了另一个胶水开发端点集群,即使没有S3FullAccess,新集群也没有错误。

Maybe every time I wake up Spark on a glue cluster, the cluster automatically tries to fetch some update from some designated S3 bucket, and sometimes it got in trouble when the cluster was built just before some update release. 也许每次我在胶水集群上唤醒Spark时,集群都会自动尝试从某些指定的S3存储桶中获取某些更新,有时在构建集群时就遇到了麻烦,因为该集群是在某个更新版本发布之前构建的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 TypeError:“ JavaPackage”对象不可调用 - TypeError: 'JavaPackage' object is not callable PySpark: TypeError: 'JavaPackage' object 在执行 spark.createdataframe() 时不可调用 - PySpark : TypeError: 'JavaPackage' object is not callable when executing spark.createdataframe() h2o-pysparkling-2.4 和 Glue 作业:{"error":"TypeError: 'JavaPackage' object is not callable","errorType":"EXECUTION_FAILURE"} - h2o-pysparkling-2.4 and Glue Jobs with: {"error":"TypeError: 'JavaPackage' object is not callable","errorType":"EXECUTION_FAILURE"} 类型错误:'JavaPackage' object 不可调用(spark._jvm) - TypeError: 'JavaPackage' object is not callable (spark._jvm) Colab + sedona:TypeError:&#39;JavaPackage&#39;对象不可调用 - Colab + sedona: TypeError: 'JavaPackage' object is not callable 类型错误:'JavaPackage' object 不可调用(sc._jvm) - TypeError: 'JavaPackage' object is not callable (sc._jvm) “JavaPackage”对象不可调用 - 'JavaPackage' object is not callable PySpark:TypeError:&#39;列&#39;对象不可调用 - PySpark: TypeError: 'Column' object is not callable spark-nlp 'JavaPackage' object 不可调用 - spark-nlp 'JavaPackage' object is not callable Pyspark UDF-TypeError:“模块”对象不可调用 - Pyspark UDF - TypeError: 'module' object is not callable
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM