简体   繁体   English

安装 AWS Glue ETL 库

[英]Installing AWS Glue ETL Library

Issue问题

I am facing the below error after having set up the AWS Glue Library:设置 AWS Glue 库后,我面临以下错误:

PS C:\Users\[user]\Documents\[company]\projects\code\data-lake\etl\tealium> python visitor.py
20/04/05 19:33:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
  File "visitor.py", line 9, in <module>
    glueContext = GlueContext(sc.getOrCreate())
  File "C:\Users\[user]\Documents\[company]\projects\code\aws-glue-libs-glue-1.0\PyGlue.zip\awsglue\context.py", line 45, in __init__
  File "C:\Users\[user]\Documents\[company]\projects\code\aws-glue-libs-glue-1.0\PyGlue.zip\awsglue\context.py", line 66, in _get_glue_scala_context
TypeError: 'JavaPackage' object is not callable

Scenario设想

I am trying to install AWS GLue ETL Library in a virtual environment using PIPENV.我正在尝试使用 PIPENV 在虚拟环境中安装 AWS GLue ETL 库。 So I've got the below.env file with the environment variables:所以我得到了带有环境变量的 below.env 文件:

HADOOP_HOME="C:\Users\[user]\AppData\Local\Spark\winutils"
SPARK_HOME="C:\Users\[user]\AppData\Local\Spark\spark-2.4.3-bin-hadoop2.8\spark-2.4.3-bin-hadoop2.8\spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8"
JAVA_HOME="C:\Program Files\Java\jdk1.8.0_231"
PATH="${HADOOP_HOME}\bin"
PATH="${SPARK_HOME}\bin:${PATH}"
PATH="${JAVA_HOME}\bin:${PATH}"
SPARK_CONF_DIR="C:\Users\[user]\Documents\[company]\projects\code\aws-glue-libs-glue-1.0\conf"
PYTHONPATH="${SPARK_HOME}/python/:${PYTHONPATH}"
PYTHONPATH="${SPARK_HOME}/python/lib/py4j-0.10.7-src.zip:${PYTHONPATH}"
PYTHONPATH="C:/Users/[user]/Documents/[company]/projects/code/aws-glue-libs-glue-1.0/PyGlue.zip:${PYTHONPATH}" 

My code initially is quite simple and I am only creating the Glue Context as bellow:我的代码最初非常简单,我只创建 Glue 上下文,如下所示:

from awsglue.context import GlueContext
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from pyspark.conf import SparkConf

sc = SparkContext()

glueContext = GlueContext(sc.getOrCreate())

print(glueContext)
print(sc)

Do you guys know what it may be this issue?你们知道这可能是什么问题吗?

try this instead also you if you create new glue job it'll give you boilerplate code which solve your problem..如果你创建新的胶水作业,也可以试试这个,它会给你样板代码来解决你的问题..

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM