I'am trying to translate differents PL/SQL script in HiveQL.
Theses differents scrip fts are used in an ETL process, to import data from and to differents tables.
I'am trying to do the same things in Hadoop/Hive, using HiveQL
But, one of this script give me some problems.
Here is my PL/SQL script :
BEGIN
-- Mise a jour au niveau magasin et famille
MERGE INTO KPI.THM_CA_RGRP_PRODUITS_JOUR cible USING (
SELECT
in_co_societe as CO_SOCIETE,
in_dt_jour as DT_JOUR,
'MAG' as TYPE_ENTITE,
m.co_magasin as CODE_ENTITE,
'FAM' as TYPE_RGRP_PRODUITS,
sourceunion.CO_RGRP_PRODUITS as CO_RGRP_PRODUITS,
SUM(MT_CA_NET_TTC) as MT_CA_NET_TTC,
SUM(MT_OBJ_CA_NET_TTC) as MT_OBJ_CA_NET_TTC,
SUM(NB_CLIENTS) as NB_CLIENTS,
SUM(MT_CA_NET_TTC_COMP) as MT_CA_NET_TTC_COMP,
SUM(MT_OBJ_CA_NET_TTC_COMP) as MT_OBJ_CA_NET_TTC_COMP,
SUM(NB_CLIENTS_COMP) as NB_CLIENTS_COMP
FROM (
-- Mise a jour du CA
SELECT
mtransf.id_mag_transfere as ID_MAGASIN,
v.co_famille as CO_RGRP_PRODUITS,
sum(v.mt_ca_net_ttc) as MT_CA_NET_TTC,
0 as MT_OBJ_CA_NET_TTC,
0 as NB_CLIENTS,
sum(v.mt_ca_net_ttc * DECODE(mtransf.flag_mag_comp, 'NC', 0, 1))
as MT_CA_NET_TTC_COMP,
0 as MT_OBJ_CA_NET_TTC_COMP,
0 as NB_CLIENTS_COMP
FROM themis.VENTES_FAM v
INNER JOIN kpi.kpi_magasin mtransf
ON mtransf.co_societe = CASE WHEN v.co_societe = 1 THEN 1 ELSE 2 END
AND mtransf.id_magasin = v.id_magasin
WHERE
mtransf.co_societe = in_co_societe
AND v.dt_jour = in_dt_jour
GROUP BY
mtransf.id_mag_transfere,
v.co_famille
UNION
-- Mise a jour des Objectifs ->Non car les objectifs ne sont pas d¿¿finis ¿¿ la famille
-- Mise a jour du Nombre de clients
SELECT
mtransf.id_mag_transfere as ID_MAGASIN,
v.co_famille as CO_RGRP_PRODUITS,
0 as MT_CA_NET_TTC,
0 as MT_OBJ_CA_NET_TTC,
sum(nb_client) as NB_CLIENTS,
0 as MT_CA_NET_TTC_COMP,
0 as MT_OBJ_CA_NET_TTC_COMP,
sum(nb_client * DECODE(mtransf.flag_mag_comp, 'NC', 0, 1))
as NB_CLIENTS_COMP
FROM ods.nb_clients_mag_fam_j v
INNER JOIN kpi.kpi_magasin mtransf
ON mtransf.co_societe = CASE WHEN v.co_societe = 1 THEN 1 ELSE 2 END
AND mtransf.id_magasin = v.id_magasin
WHERE
mtransf.co_societe = in_co_societe
AND v.dt_jour = in_dt_jour
GROUP BY
mtransf.id_mag_transfere,
v.co_famille
) sourceunion
INNER JOIN kpi.kpi_magasin m
ON m.co_societe = in_co_societe
AND m.id_magasin = sourceunion.id_magasin
GROUP BY
m.co_magasin,
sourceunion.CO_RGRP_PRODUITS
) source
ON (
cible.co_societe = source.co_societe
and cible.dt_jour = source.dt_jour
and cible.type_entite = source.type_entite
and cible.code_entite = source.code_entite
and cible.type_rgrp_produits = source.type_rgrp_produits
and cible.co_rgrp_produits = source.co_rgrp_produits
)
WHEN NOT MATCHED THEN
INSERT (
cible.CO_SOCIETE,
cible.DT_JOUR,
cible.TYPE_ENTITE,
cible.CODE_ENTITE,
cible.TYPE_RGRP_PRODUITS,
cible.CO_RGRP_PRODUITS,
cible.MT_CA_NET_TTC,
cible.MT_OBJ_CA_NET_TTC,
cible.NB_CLIENTS,
cible.MT_CA_NET_TTC_COMP,
cible.MT_OBJ_CA_NET_TTC_COMP,
cible.NB_CLIENTS_COMP
)
VALUES (
source.CO_SOCIETE,
source.DT_JOUR,
source.TYPE_ENTITE,
source.CODE_ENTITE,
source.TYPE_RGRP_PRODUITS,
source.CO_RGRP_PRODUITS,
source.MT_CA_NET_TTC,
source.MT_OBJ_CA_NET_TTC,
source.NB_CLIENTS,
source.MT_CA_NET_TTC_COMP,
source.MT_OBJ_CA_NET_TTC_COMP,
source.NB_CLIENTS_COMP
);
is there a way to do this with Hive?
Thanks for you help.
The PL/SQL statement in your question is a bit too long for such a general question. I might have trouble following it, but my understanding is that you are inserting into the KPI.THM_CA_RGRP_PRODUITS_JOUR table results of some query, unless they match existing rows.
Hadoop does not support appending to existing HDFS files, but you can tell Hive to treat some HDFS directories as partitions.
Word "JOUR" in the name of your table makes me think that the data in it can be naturally partitioned by day. I would suggest doing the E and T step in your source system, ie generate, say, a CSV file with the results of the SELECT. Then load it into the HDFS. If you do daily exports and can narrow down the records to be inserted on the source side, you'd only have to tell Hive that you are adding a new partition to the table.
If you have to filetr out the records that are already present in your table, which is likely the reason why you are using a MERGE instead of a straight insert, you might need to write a simple Map/Reduce job to merge new data with the existing one.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.