简体   繁体   English

在 AWS 中监控 python shell 粘合作业

[英]Monitoring python shell glue jobs in AWS

In the AWS documentation, they specify how to activate monitoring for Spark jobs ( https://docs.aws.amazon.com/glue/latest/dg/monitor-profile-glue-job-cloudwatch-metrics.html ), but not python shell jobs.在 AWS 文档中,他们指定了如何激活对 Spark 作业的监控( https://docs.aws.amazon.com/glue/latest/dg/monitor-profile-glue-job-cloudwatch-metrics.html ),但没有python shell 作业。

Using the code as is gives me this error: ModuleNotFoundError: No module named 'pyspark'按原样使用代码会给我这个错误: ModuleNotFoundError: No module named 'pyspark'

Worse, after commenting out from pyspark.context import SparkContext , I then get ModuleNotFoundError: No module named 'awsglue.context' .更糟糕的是,在from pyspark.context import SparkContext ,我得到ModuleNotFoundError: No module named 'awsglue.context' It seems the python shell jobs don't have access to glue context?似乎 python shell 作业无法访问胶水上下文? Has anyone solved for this?有没有人解决这个问题?

The python shell jobs are purely python based environment and do not have access to pyspark ( EMR in the backend). python shell 作业纯粹是基于 python 的环境,无权访问 Z77BB59DCD89559748E424B56956C1 后端。 You will not be able to get access to the context attribute here.您将无法在此处访问上下文属性。 That is purely a spark concept and glue is essentially a wrapper around pyspark.这纯粹是一个火花概念,胶水本质上是 pyspark 的包装。

I am getting into glue python shell jobs more, and resolving some dependencies in some code files that are shared between my spark jobs and pyshell jobs.我正在进入胶水 python shell 作业,并解决我的 spark 作业和 pyshell 作业之间共享的一些代码文件中的一些依赖关系。 I was able to resolve the pyspark dependency, by including in the creation of my.egg/.whl file, in requirements.txt, pyspark==2.4.7.我能够解决 pyspark 依赖项,方法是在 requirements.txt 中创建 my.egg/.whl 文件,pyspark==2.4.7。 That version because another library required it.该版本是因为另一个库需要它。

You still cannot use the pyspark context as mentioned above by Emerson, because this is python runtime, not the spark runtime.您仍然不能使用上面艾默生提到的 pyspark 上下文,因为这是 python 运行时,而不是火花运行时。

So when building distribution with setuptools, can have a requirements.txt that looks like this(below), and when the shell is setup, it will install these dependencies:因此,在使用 setuptools 构建发行版时,可以有一个如下所示的 requirements.txt,并且在设置 shell 时,它将安装这些依赖项:

elasticsearch elasticsearch
aws_requests_auth aws_requests_auth
pg8000 PG8000
pyspark==2.4.7 pyspark==2.4.7
awsglue-local awsglue-本地

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM