I have got an access problem at pig with my UDF. I have done a grouping "Group BY" at received an output which is (Andi, 19495) and is described by pig as C: {group: chararray, long}
. Now I would like to format the output to (Andi 19495) as a string. But my UDF reports the following
"Caught error from UDF: pigUDF.Output, Out of bounds access [Index: 1, Size: 1]"
I do not understand why this is happening.
Here is my java UDF:
package pigUDF;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.BagFactory;
import org.apache.pig.data.DataType;
import org.apache.pig.data.Tuple;
import org.apache.pig.data.TupleFactory;
import org.apache.pig.impl.logicalLayer.schema.Schema;
import org.apache.pig.impl.logicalLayer.schema.Schema.FieldSchema;
public class Output extends EvalFunc<Tuple>{
TupleFactory tupleFactory = TupleFactory.getInstance();
BagFactory mBagFactory = BagFactory.getInstance();
private static Tuple nullValue=TupleFactory.getInstance().newTuple(2);
@Override
public Tuple exec(Tuple input) throws IOException {
if (input==null) return nullValue;
Tuple t= tupleFactory.newTuple(1);
String o = (String) input.get(0);
int o1 = (Integer) input.get(1);
String myString=o+" "+String.valueOf(o1);
System.out.println(myString);
t.set(0,myString);
return t;
}
@Override
public Schema outputSchema(Schema input){
Schema tupleSchema = new Schema();
tupleSchema.add(new FieldSchema("group", DataType.CHARARRAY));
Schema s = new Schema (new FieldSchema(null, tupleSchema));
return s;
}
}
Instead of writing an UDF you could use CONCAT and achieve the same result.
CONCAT the group with the resulting expression of ' ' and SUM(A).Also since CONCAT expects expressions of the same type,you will have to cast the int SUM(A) value to chararray. Try changing
C = FOREACH A GENERATE group, SUM(A);
To
C = FOREACH A GENERATE CONCAT(group,CONCAT(' ',(chararray)SUM(A)));
You can use an alternate approach to get a similar result without invoking a custom UDF.
INPUT
Andi,19495
SCRIPT
A = LOAD 'data.txt' USING PigStorage(',') AS (name:chararray,value:long);
B = FOREACH A GENERATE CONCAT(name,CONCAT(' ',(chararray)value));
DUMP B;
OUTPUT
(Andi 19495)
Hope this helps for now. I will post a UDF based solution too for you later.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.