简体   繁体   中英

Cascading tutorial word count example error

I am learning Cascading now. Now I am looking the second tutorial on its official website which is about Work Count example. I copy the code from it and try to run, it always gives me the following errors:

Exception in thread "main" cascading.flow.planner.PlannerException: could not build flow from assembly: [[token][com.starscriber.cascadingtest.Main.main(Main.java:44)] 
unable to resolve argument selector: [{1}:'text'], with incoming: [{1}:'doc01        A rain shadow is a dry area on the lee back side of a mountainous area.']] at cascading.flow.planner.FlowPlanner.handleExceptionDuringPlanning(FlowPlanner.java:576)
at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:263)
at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:80)
at cascading.flow.FlowConnector.connect(FlowConnector.java:459)
at com.starscriber.cascadingtest.Main.main(Main.java:58)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Caused by: cascading.pipe.OperatorException: [token][com.starscriber.cascadingtest.Main.main(Main.java:44)] 
unable to resolve argument selector: [{1}:'text'], with incoming: [{1}:'doc01        A rain shadow is a dry area on the lee back side of a mountainous area.']
at cascading.pipe.Operator.resolveArgumentSelector(Operator.java:345)
at cascading.pipe.Each.outgoingScopeFor(Each.java:368)
at cascading.flow.planner.ElementGraph.resolveFields(ElementGraph.java:628)
at cascading.flow.planner.ElementGraph.resolveFields(ElementGraph.java:610)
at cascading.flow.hadoop.planner.HadoopPlanner.buildFlow(HadoopPlanner.java:248)
... 8 more

Caused by: cascading.tuple.FieldsResolverException: 
could not select fields: [{1}:'text'], from: [{1}:'doc01        A rain shadow is a dry area on the lee back side of a mountainous area.']
at cascading.tuple.Fields.indexOf(Fields.java:1008)
at cascading.tuple.Fields.select(Fields.java:1064)
at cascading.pipe.Operator.resolveArgumentSelector(Operator.java:341)
... 12 more

How come?? I copy the exactly same code which is from its official Github and don't change anything...

String docPath = args[0];
String wcPath = args[1];

Properties properties = new Properties();          
AppProps.setApplicationJarClass(properties, Main.class);
HadoopFlowConnector flowConnector = new HadoopFlowConnector(properties);

// create source and sink taps
Tap docTap = new Hfs(new TextDelimited(true, "\t"), docPath);
Tap wcTap = new Hfs(new TextDelimited(true, "\t"), wcPath);

// specify a regex operation to split the "document" text lines into a token stream
Fields token = new Fields("token");
Fields text = new Fields("text");
RegexSplitGenerator splitter = new RegexSplitGenerator(token, "[ \\[\\]\\(\\),.]");
// only returns "token"
Pipe docPipe = new Each("token", text, splitter, Fields.RESULTS);

// determine the word counts
Pipe wcPipe = new Pipe("wc", docPipe);
wcPipe = new GroupBy(wcPipe, token);
wcPipe = new Every(wcPipe, Fields.ALL, new Count(), Fields.ALL);

// connect the taps, pipes, etc., into a flow
FlowDef flowDef = FlowDef.flowDef()
            .setName("wc")
            .addSource(docPipe, docTap)
            .addTailSink(wcPipe, wcTap);

// write a DOT file and run the flow
Flow wcFlow = flowConnector.connect(flowDef);
wcFlow.writeDOT("dot/wc.dot");
wcFlow.complete();

Where is the problem??

And this is the input file:

doc01        A rain shadow is a dry area on the lee back side of a mountainous area.
doc02        This sinking, dry air produces a rain shadow, or area in the lee of a mountain with less rain and cloudcover.
doc03        A rain shadow is an area of dry land that lies on the leeward (or downwind) side of a mountain.
doc04        This is known as the rain shadow effect and is the primary cause of leeward deserts of mountain ranges, such as California's Death Valley.
doc05        Two Women. Secrets. A Broken Land. [DVD Australia]

Once check if there is tab between the two fields docId and text in the input file. Program is expecting two fields with tab separated, but in your case it is reading whole line into one field.

As other people have already mentioned you need to have the same headers the example is expecting. Instead of copying the code, try to clone the repository so that you won't have any error related to file formatting

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM