Translate PL/SQL ETL process into HiveQL

Question

I'am trying to translate differents PL/SQL script in HiveQL.

Theses differents scrip fts are used in an ETL process, to import data from and to differents tables.

I'am trying to do the same things in Hadoop/Hive, using HiveQL

But, one of this script give me some problems.

Here is my PL/SQL script :

BEGIN

  -- Mise a jour au niveau magasin et famille
  MERGE INTO KPI.THM_CA_RGRP_PRODUITS_JOUR cible USING (
    SELECT
      in_co_societe                                               as CO_SOCIETE,
      in_dt_jour                                                  as DT_JOUR,
      'MAG'                                                       as TYPE_ENTITE,
      m.co_magasin                                                as CODE_ENTITE,
      'FAM'                                                       as TYPE_RGRP_PRODUITS,
      sourceunion.CO_RGRP_PRODUITS                                as CO_RGRP_PRODUITS,
      SUM(MT_CA_NET_TTC)                                          as MT_CA_NET_TTC,
      SUM(MT_OBJ_CA_NET_TTC)                                      as MT_OBJ_CA_NET_TTC,
      SUM(NB_CLIENTS)                                             as NB_CLIENTS,
      SUM(MT_CA_NET_TTC_COMP)                                     as MT_CA_NET_TTC_COMP,
      SUM(MT_OBJ_CA_NET_TTC_COMP)                                 as MT_OBJ_CA_NET_TTC_COMP,
      SUM(NB_CLIENTS_COMP)                                        as NB_CLIENTS_COMP
    FROM (
      -- Mise a jour du CA
      SELECT
        mtransf.id_mag_transfere             as ID_MAGASIN,
        v.co_famille                         as CO_RGRP_PRODUITS,
        sum(v.mt_ca_net_ttc)                 as MT_CA_NET_TTC,
        0                                    as MT_OBJ_CA_NET_TTC,
        0                                    as NB_CLIENTS,
        sum(v.mt_ca_net_ttc * DECODE(mtransf.flag_mag_comp, 'NC', 0, 1))
                                             as MT_CA_NET_TTC_COMP,
        0                                    as MT_OBJ_CA_NET_TTC_COMP,
        0                                    as NB_CLIENTS_COMP
      FROM themis.VENTES_FAM v
      INNER JOIN kpi.kpi_magasin mtransf
      ON  mtransf.co_societe = CASE WHEN v.co_societe = 1 THEN 1 ELSE 2 END
      AND mtransf.id_magasin = v.id_magasin
      WHERE
          mtransf.co_societe    = in_co_societe
      AND v.dt_jour             = in_dt_jour
      GROUP BY
        mtransf.id_mag_transfere,
        v.co_famille
      UNION
      -- Mise a jour des Objectifs ->Non car les objectifs ne sont pas d¿¿finis ¿¿ la famille
      -- Mise a jour du Nombre de clients
      SELECT
        mtransf.id_mag_transfere             as ID_MAGASIN,
        v.co_famille                         as CO_RGRP_PRODUITS,
        0                                    as MT_CA_NET_TTC,
        0                                    as MT_OBJ_CA_NET_TTC,
        sum(nb_client)                       as NB_CLIENTS,
        0                                    as MT_CA_NET_TTC_COMP,
        0                                    as MT_OBJ_CA_NET_TTC_COMP,
        sum(nb_client * DECODE(mtransf.flag_mag_comp, 'NC', 0, 1))
                                             as NB_CLIENTS_COMP
      FROM ods.nb_clients_mag_fam_j v
      INNER JOIN kpi.kpi_magasin mtransf
      ON  mtransf.co_societe = CASE WHEN v.co_societe = 1 THEN 1 ELSE 2 END
      AND mtransf.id_magasin = v.id_magasin
      WHERE
          mtransf.co_societe    = in_co_societe
      AND v.dt_jour             = in_dt_jour
      GROUP BY
        mtransf.id_mag_transfere,
        v.co_famille
    ) sourceunion
    INNER JOIN kpi.kpi_magasin m
    ON  m.co_societe = in_co_societe
    AND m.id_magasin = sourceunion.id_magasin
    GROUP BY
      m.co_magasin,
      sourceunion.CO_RGRP_PRODUITS
  ) source
  ON (
        cible.co_societe  = source.co_societe
    and cible.dt_jour     = source.dt_jour
    and cible.type_entite = source.type_entite
    and cible.code_entite = source.code_entite
    and cible.type_rgrp_produits = source.type_rgrp_produits
    and cible.co_rgrp_produits = source.co_rgrp_produits
  )
 WHEN NOT MATCHED THEN
    INSERT (
      cible.CO_SOCIETE,
      cible.DT_JOUR,
      cible.TYPE_ENTITE,
      cible.CODE_ENTITE,
      cible.TYPE_RGRP_PRODUITS,
      cible.CO_RGRP_PRODUITS,
      cible.MT_CA_NET_TTC,
      cible.MT_OBJ_CA_NET_TTC,
      cible.NB_CLIENTS,
      cible.MT_CA_NET_TTC_COMP,
      cible.MT_OBJ_CA_NET_TTC_COMP,
      cible.NB_CLIENTS_COMP
    )
    VALUES (
      source.CO_SOCIETE,
      source.DT_JOUR,
      source.TYPE_ENTITE,
      source.CODE_ENTITE,
      source.TYPE_RGRP_PRODUITS,
      source.CO_RGRP_PRODUITS,
      source.MT_CA_NET_TTC,
      source.MT_OBJ_CA_NET_TTC,
      source.NB_CLIENTS,
      source.MT_CA_NET_TTC_COMP,
      source.MT_OBJ_CA_NET_TTC_COMP,
      source.NB_CLIENTS_COMP
    );

is there a way to do this with Hive?

Thanks for you help.

Answer 1

The PL/SQL statement in your question is a bit too long for such a general question. I might have trouble following it, but my understanding is that you are inserting into the KPI.THM_CA_RGRP_PRODUITS_JOUR table results of some query, unless they match existing rows.

Hadoop does not support appending to existing HDFS files, but you can tell Hive to treat some HDFS directories as partitions.

Word "JOUR" in the name of your table makes me think that the data in it can be naturally partitioned by day. I would suggest doing the E and T step in your source system, ie generate, say, a CSV file with the results of the SELECT. Then load it into the HDFS. If you do daily exports and can narrow down the records to be inserted on the source side, you'd only have to tell Hive that you are adding a new partition to the table.

If you have to filetr out the records that are already present in your table, which is likely the reason why you are using a MERGE instead of a straight insert, you might need to write a simple Map/Reduce job to merge new data with the existing one.

Translate PL/SQL ETL process into HiveQL

Question

1 answers

solution1
1 2013-06-25 16:20:18

Translate PL/SQL ETL process into HiveQL

Question

1 answers

solution1 1 2013-06-25 16:20:18

solution1
1 2013-06-25 16:20:18