简体   繁体   English

如何在Python中通过Hortonworks Sandbox运行MapReduce脚本?

[英]How to run MapReduce script through Hortonworks Sandbox in Python?

I have Hortonworks Sandbox and ran the command:我有 Hortonworks Sandbox 并运行命令:

ssh root@127.0.0.1 -p 2222;

After logging in, I would like to run MapReduce on 2 HDFS files RatinsBreakdown.py and u.data located under Documents like I did here:登录后,我想在 Documents 下的 2 HDFS 文件 RatinsBreakdown.py 和 u.data 上运行 MapReduce 就像我在这里所做的那样:

python RatingsBreakdown.py -r hadoop hdfs:///user/[username]/u.data --hadoop-streaming-jar /usr/hdp/2.6.2.0-205/hadoop-mapreduce/hadoop-streaming.jar

How can I adjust the command above to run through the Hadoop cluster?如何调整上面的命令来运行Hadoop集群?

[root@sandbox ~]#

If RatingsBreakdown.py is a mrjob process, then that command you've shown does everything you want.如果RatingsBreakdown.py是一个mrjob进程,那么您显示的那个命令会执行您想要的一切。 You can open the YARN UI to verify the process ran in the cluster.您可以打开 YARN UI 来验证集群中运行的进程。

Otherwise, the documentation on Hadoop Streaming should point you at the correct location否则, 有关 Hadoop Streaming 的文档应将您指向正确的位置

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM