简体   繁体   中英

Pig UDF java out of index

I have got an access problem at pig with my UDF. I have done a grouping "Group BY" at received an output which is (Andi, 19495) and is described by pig as C: {group: chararray, long} . Now I would like to format the output to (Andi 19495) as a string. But my UDF reports the following

"Caught error from UDF: pigUDF.Output, Out of bounds access [Index: 1, Size: 1]"

I do not understand why this is happening.

Here is my java UDF:

package pigUDF;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.BagFactory;
import org.apache.pig.data.DataType;
import org.apache.pig.data.Tuple;
import org.apache.pig.data.TupleFactory;
import org.apache.pig.impl.logicalLayer.schema.Schema;
import org.apache.pig.impl.logicalLayer.schema.Schema.FieldSchema;

public class Output extends EvalFunc<Tuple>{
   TupleFactory tupleFactory = TupleFactory.getInstance();
   BagFactory mBagFactory = BagFactory.getInstance();

   private static Tuple nullValue=TupleFactory.getInstance().newTuple(2);

   @Override
   public Tuple exec(Tuple input) throws IOException {

     if (input==null) return nullValue;

     Tuple t= tupleFactory.newTuple(1);

     String o = (String) input.get(0);
     int o1 = (Integer) input.get(1);


     String myString=o+" "+String.valueOf(o1);
     System.out.println(myString);

     t.set(0,myString);

     return t;        
   }

   @Override
   public Schema outputSchema(Schema input){

     Schema tupleSchema = new Schema();
     tupleSchema.add(new FieldSchema("group", DataType.CHARARRAY));        
     Schema s = new Schema (new FieldSchema(null, tupleSchema));
     return s;        
   }    
}

Instead of writing an UDF you could use CONCAT and achieve the same result.

CONCAT the group with the resulting expression of ' ' and SUM(A).Also since CONCAT expects expressions of the same type,you will have to cast the int SUM(A) value to chararray. Try changing

C = FOREACH A GENERATE group, SUM(A);

To

C = FOREACH A GENERATE CONCAT(group,CONCAT(' ',(chararray)SUM(A)));

You can use an alternate approach to get a similar result without invoking a custom UDF.

INPUT

Andi,19495

SCRIPT

A = LOAD 'data.txt' USING PigStorage(',') AS (name:chararray,value:long);

B = FOREACH A GENERATE CONCAT(name,CONCAT(' ',(chararray)value));

DUMP B;

OUTPUT

(Andi 19495)

Hope this helps for now. I will post a UDF based solution too for you later.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM