简体   繁体   中英

Impala- Error produced when querying data produced by a java UDF

First of all, my objective is not to make you understand my UDF code so i can achieve my goal (which i know it does), but to know why i get an error after calling for the String it generates in later queries.

I made a custom UDF which code is:

import java.util.HashMap;

import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.udf.UDFType;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;

public class Calculate_states extends UDF
{

        HashMap<String, Long> last_ts = 
                new HashMap<String, Long>();

        HashMap<String, Integer> last_val = 
                new HashMap<String, Integer>();

        HashMap<String, Long> ts_last_start = 
                new HashMap<String, Long>();

        HashMap<String, String> start_type = 
                new HashMap<String, String>();


  public  String evaluate( Integer bit,  Long ts,  Long next_ts, Integer next_bit, Integer time, String Ut)
  {

    Object[] result = new Object[4];
    String estado = new String();

    if(bit==null)
    {
        result[0]=new Text("no state");

    }
    else 

    {
        if(bit==1 && (

                    (
                    ( next_ts == null ||  ((next_ts-ts)/1000) > time )
                    &&
                    ( last_ts.get(Ut) == null  || ((ts-last_ts.get(Ut))/1000) > time  ) 
                    )

                    ||
                    (
                    (last_val.get(Ut)!=null) && 
                    last_val.get(Ut)==0 && ((ts-last_ts.get(Ut))/1000) <=time && 
                        (next_ts == null ||
                        (next_ts-ts)/1000 > time)
                    )
                    ||
                    (
                            (next_bit!=null) && // Condición necesaria para no entrar en problemas con los nulls
                            (       next_bit==0 && ((next_ts-ts)/1000) <= time && 
                            (   (last_ts.get(Ut) == null ||
                                ((ts-last_ts.get(Ut))/1000) > time )

                            )
                            )
                    )
                    )
                )
             { estado= "isolated point";
            result[0]=new Text("isolated point");}

        else if 
    (

            bit==1 && 
            (
            last_val.get(Ut) != null && // Para evitar problemas de nulls
            last_val.get(Ut)==0 && ((ts-last_ts.get(Ut))/1000 ) <=time)
    ){  estado= "start";
        result[0]=new Text("start");}

    else if 
    (
            bit==0 && 
            ( last_val.get(Ut) != null && // Para evitar problemas de nulls
            last_val.get(Ut)==1 && ((ts-last_ts.get(Ut))/1000 ) <=time )
    ){estado= "stop";
        result[0]=new Text("stop");}    
    else if 
    (
            bit==1 && (last_ts.get(Ut)==null ||  ((ts-last_ts.get(Ut))/1000 ) > time  )
    ){estado= "no info start";
        result[0]=new Text("no info start");}
     else if 
    (
            bit==1 && (next_bit==null || ((next_ts-ts)/1000 ) > time  )
    ){estado= "no info stop";
        result[0]=new Text("no info stop");} 
    else if
    (bit==1 ){
        result[0]=new Text("working");}
    else if
    (bit==0 ){
        result[0]=new Text("stopped");}
        // Actualizar valores
        last_val.put(Ut,bit);
        last_ts.put(Ut,ts);
    }

    if (estado.equals("isolated point"))
    { result[1]= new LongWritable(1);
    // Podria ser freq. muestreo, nuevo parametro
    result[2]= new Text("isolated point");
      result[3]= new LongWritable(ts);
    } 

    else if ( 
     estado.equals("start") || 
     estado.equals("no info start")
      ){
        ts_last_start.put(Ut,ts);
        start_type.put(Ut,estado);
        //result[2]=null;
        result[3]=new LongWritable(ts);
        }
    else if ( 
            estado.equals("stop") || 
            estado.equals("no info stop")
              ){
                result[3]=new LongWritable(ts_last_start.get(Ut));
                result[1]= new LongWritable((ts-ts_last_start.get(Ut))/1000);
                result[2]= new Text(start_type.get(Ut)+"-"+estado); 
                ts_last_start.put(Ut,null);
                }
    else 
        //result[2]=null;
        if (ts_last_start.get(Ut) == null)
        {
            result[3] =null;
        }
        else
        result[3]=new LongWritable(ts_last_start.get(Ut));

    String resultado="";
    for (int i=0;i<4;i++)
    {
    if (i==3)
    resultado=resultado+String.valueOf(result[i]);
    else
        resultado=resultado+String.valueOf(result[i])+";";
    }

    return resultado;
  }
}

It´s objective is to calculate the states of a component (where it starts working, stops working) and put an identifier to all the rows between the start and stop. 1/0 would mean working/not working component.

So for example, this QUERY :

select 
ut,ts, bit, 
calculate_states(bit,ts,if (bit is null, null,next_ts),next_bit,1,ut) as states
from 
(
select 
ut,
ts,
bit, -- Means component bit
last_value(bit ignore nulls) over (partition by ut order by ts desc rows between 1 preceding and 1 preceding) as next_bit,
min(if (bit is not null, ts, null)) over (partition by ut order by ts desc rows between unbounded preceding and 1 preceding) as next_ts
from my_table
order by 1,2
)b
order by 1,2;

Would return (in this table):

UT  |  ts  |  bit |   States
a     1000     0      stopped;null;null;null
a     2000     0      stopped;null;null;null
a     3000     0      stopped;null;null;null
a     4000     1      start;null;null;4000
a     5000     1      no info stop;2;start-no info stop;4000
a     6000     null   no state;null;null;null
a     7000     1      no info start;null;null;7000
a     8000     1      working;null;null;7000
a     9000     0      stop;3;no info start-stop;7000
a     10000    1      start;null;null;10000
a     11000    1      working;null;null;10000
a     12000    1      no info stop;3;start-no info stop;10000

All correct till here. Now, i just add

select * from QUERY order by ut,ts

or

create table new_table as QUERY and select * from new_table order by ut,ts

After getting this error in my log:

UDF WARNING: Hive UDF path=hdfs://mypath class=UDFpackImpala.Calculate_states failed due to: ImpalaRuntimeException: UDF::evaluate() ran into a problem.
CAUSED BY: ImpalaRuntimeException: UDF failed to evaluate
CAUSED BY: InvocationTargetException: null
CAUSED BY: NullPointerException: null

My result would switch for the one i marked before to something like

UT  |  ts  |  bit |   States
a     1000     0      stopped;null;null;null
a     2000     0      stopped;null;null;null
a     3000     0      NULL
a     4000     1      stop;null;null;4000
a     5000     1      working;null;null;null
a     6000     null   start;null;null;null
a     7000     1      working;null;null;null
a     8000     1      working;null;null;null
a     9000     0      stop;-1;no info start-stop;10000
a     10000    1      start;null;null;10000
a     11000    1      working;null;null;10000
a     12000    1      isolated point;1;null;12000

Totally random stuff. My question is, why?

Impala version is: 2.9.0-cdh5.12.2

It all happened because impala does not respect the order by clause in the first select if you do not include a limit.

If you put a limit 99999999999 after the first order by 1,2, problem solved.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM