[英]Can I actually run a Spark job on a mocked EMR cluster?
Using moto
I was able to mock an EMR cluster:使用
moto
我能够模拟一个 EMR 集群:
with moto.mock_emr():
client = boto3.client('emr', region_name='us-east-1')
client.run_job_flow(
Name='my_cluster',
Instances={
'MasterInstanceType': 'c3.xlarge',
'SlaveInstanceType': 'c3.xlarge',
'InstanceCount': 3,
'Placement': {'AvailabilityZone': 'us-east-1a'},
'KeepJobFlowAliveWhenNoSteps': True,
},
VisibleToAllUsers=True,
)
summary = client.list_clusters()
cluster_id = summary["Clusters"][0]["Id"]
res = client.add_job_flow_steps(
JobFlowId=cluster_id,
Steps=[
{
"Name": "foo_step",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {"Args": [], "Jar": "command-runner.jar"},
}
],
)
The added step seems to be in a STARTING
state all the time.添加的步骤似乎一直处于
STARTING
状态。 Is it possible to actually submit a Spark job to the mocked cluster and run it there?是否可以将 Spark 作业实际提交到模拟集群并在那里运行?
I am building a utility that submit jobs to EMR clusters and I want to test it.我正在构建一个将作业提交到 EMR 集群的实用程序,我想对其进行测试。 I want to run a trivial Spark job using this utility and this is where the question is coming from.
我想使用此实用程序运行一个简单的 Spark 作业,这就是问题的来源。 Note that I'm not interested in a Spark cluster or testing the correctness of the submitted Spark job.
请注意,我对 Spark 集群或测试提交的 Spark 作业的正确性不感兴趣。 I am actually more interested in testing the flow of submitting a job to an EMR and examining the results (that ideally should be persisted on a mocked S3 bucket).
我实际上更感兴趣的是测试向EMR提交作业的流程并检查结果(理想情况下应该保留在模拟的 S3 存储桶上)。
Its not possible, mock_emr is just a mock or (proxy to real request).它不可能,mock_emr 只是一个模拟或(真实请求的代理)。 You can develop an spark with mock_s3 and send conf to spark to read mocked s3 for your purpose
您可以使用 mock_s3 开发一个 spark 并将 conf 发送到 spark 以根据您的目的读取模拟的 s3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.