简体   繁体   中英

How to submit hadoop MR job remotely on Amazon EMR cluster

Current situation: I have an EMR cluster. On the master node - I have a python program that does a subprocess call and executes the script that contains the following line. The subprocess triggers the MR job and writes output to HDFS that I use later.

/usr/bin/hadoop jar test.jar testing.jobs.TestFeatureJob /in/f1.txt /in/f2.txt

What do I want to do? Now, I want to decouple this part. I want to run the python program locally on my laptop or a separate EC2 instance but still submit the MR job to the EMR cluster. Let's say I have the test.jar on the EMR Master node.

How do I submit this remotely? Also, I am using Python and let's also assume the JAR to be a black box. Is there any package that I can use to submit the jobs? Do I have to mention like an IP of Master node to be able to run this?

Basically once the Hadoop conf is set on the remote machine you can run Hadoop or spark remotely.

I attach here a link of spark-submit remote documentation of AWS but it's the same for MR. I mean once you finish this steps Hadoop jar suppose to work.

https://aws.amazon.com/premiumsupport/knowledge-center/emr-submit-spark-job-remote-cluster/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM