简体   繁体   English

无法运行hadoop流作业:缺少必需的选项:输入,输出

[英]Can not run hadoop streaming job: Missing required options: input, output

I'm trying to run streaming job on cluster of DSE 3.1 analytics servers. 我正在尝试在DSE 3.1分析服务器的群集上运行流作业。 I'm using Cassandra CFs for input. 我正在使用Cassandra CF输入。 But it complains about input and output parameters, but they were set (I've set it just because of complaining): 但是它抱怨输入和输出参数,但是它们是被设置的(我之所以设置它只是因为抱怨):

dse hadoop jar $HADOOP_HOME/lib/hadoop-streaming-1.0.4.8.jar \
-D cassandra.input.keyspace="tmp_ks" \
-D cassandra.input.partitioner.class="MurMur3Partitioner" \
-D cassandra.input.columnfamily="tmp_cf" \
-D cassandra.consistencylevel.read="ONE" \
-D cassandra.input.widerows=true \
-D cassandra.input.thrift.address=10.0.0.1
-inputformat org.apache.cassandra.hadoop.ColumnFamilyInputFormat \
-outputformat org.apache.hadoop.mapred.lib.NullOutputFormat \
-input /tmp_ks/tmp_cf \
-output /dev/null \
-mapper mymapper.py \
-reducer myreducer.py

Got "ERROR streaming.StreamJob: Missing required options: input, output". 得到了“错误streaming.StreamJob:缺少必需的选项:输入,输出”。 I've tried different inputs and outputs, different outputformats but got the same error. 我尝试了不同的输入和输出,不同的outputformats,但是遇到了相同的错误。

What I've done wrong? 我做错了什么?

I notice that this part of your command doesn't have a trailing backslash: 我注意到您的命令的这一部分没有反斜杠:

...
-D cassandra.input.thrift.address=10.0.0.1
...

Maybe that's screwing up the lines that follow? 也许那会弄乱后面的思路?

输入应该是HDFS上的现有路径,而输出应该是HDFS上不存在的路径

I also noticed this wrong with your command: 我还注意到您的命令有此错误:

...    
-D cassandra.input.partitioner.class="MurMur3Partitioner" \
...

The class should be "Murmur3Partitioner" 该类应为“ Murmur3Partitioner”

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM