如何在Python中通过Hortonworks Sandbox运行MapReduce脚本？

Question

I have Hortonworks Sandbox and ran the command:我有 Hortonworks Sandbox 并运行命令：

ssh root@127.0.0.1 -p 2222;

After logging in, I would like to run MapReduce on 2 HDFS files RatinsBreakdown.py and u.data located under Documents like I did here:登录后，我想在 Documents 下的 2 HDFS 文件 RatinsBreakdown.py 和 u.data 上运行 MapReduce 就像我在这里所做的那样：

python RatingsBreakdown.py -r hadoop hdfs:///user/[username]/u.data --hadoop-streaming-jar /usr/hdp/2.6.2.0-205/hadoop-mapreduce/hadoop-streaming.jar

How can I adjust the command above to run through the Hadoop cluster?如何调整上面的命令来运行Hadoop集群？

[root@sandbox ~]#

Answer 1

If RatingsBreakdown.py is a mrjob process, then that command you've shown does everything you want.如果RatingsBreakdown.py是一个mrjob进程，那么您显示的那个命令会执行您想要的一切。 You can open the YARN UI to verify the process ran in the cluster.您可以打开 YARN UI 来验证集群中运行的进程。

Otherwise, the documentation on Hadoop Streaming should point you at the correct location否则，有关 Hadoop Streaming 的文档应将您指向正确的位置

如何在Python中通过Hortonworks Sandbox运行MapReduce脚本？

问题描述

1 个解决方案

解决方案1
0 2021-10-06 02:56:37

如何在Python中通过Hortonworks Sandbox运行MapReduce脚本？

问题描述

1 个解决方案

解决方案1 0 2021-10-06 02:56:37

解决方案1
0 2021-10-06 02:56:37