简体   繁体   中英

PutHiveQL NiFi Processor extremely slow - misconfiguration?

I am currently setting up a simple NiFi flow that reads from a RDBMS source and writes to a Hive sink. The flow works as expected until the PuHiveSql processor, which is running extremely slow. It inserts one record every minute approximately.
Currently is setup as a standalone instance running on one node.

在此处输入图片说明

The logs showing the insert every 1 minute approx:

( INSERT INTO customer (id, name, address) VALUES (x, x, x) ) 在此处输入图片说明

Any ideas about why this may be? Improvements to try?

Thanks in advance

Inserting one record at a time into Hive will result extreme slowness.

As your doing regular insert into hive table:

Change your flow:

QueryDatabaseTable
PutHDFS

Then create Hive avro table on top of HDFS directory where you have stored the data.

(or)

QueryDatabaseTable
ConvertAvroToORC //incase if you need to store data in orc format
PutHDFS

Then create Hive orc table on top of HDFS directory where you have stored the data.

Are you poshing one record at time? if so may use the merge record process to create batches before pushing into HiveQL,

It is recommended to batch into 100 records : See here: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-hive-nar/1.5.0/org.apache.nifi.processors.hive.PutHiveQL/

Batch Size | 100 | The preferred number of FlowFiles to put to the database in a single transaction

Use the MergeRecord process and set the number of records or/and timeout, it should speed-up considerably

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM