简体   繁体   中英

Java Heap size error in Sqoop import

I have been trying to import data from MySQL database to Hive using Sqoop utility. I got the table created and I have given the fetch-size as low as 10. Everytime I run the command, I am getting Java Heap Size Error and the job gets killed after 4 attempts. How can I fix this.

My sqoop command is as follows :

sqoop import --connect jdbc:mysql://my_local_ip/mydatabase --fetch-size 10  --username root -P --table table_name --hive-import --compression-codec=snappy --as-parquetfile  -m 1

and I am getting :

16/08/29 07:06:24 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1472465929944_0013/
16/08/29 07:06:24 INFO mapreduce.Job: Running job: job_1472465929944_0013
16/08/29 07:06:47 INFO mapreduce.Job: Job job_1472465929944_0013 running in uber mode : false
16/08/29 07:06:47 INFO mapreduce.Job:  map 0% reduce 0%
16/08/29 07:07:16 INFO mapreduce.Job: Task Id : attempt_1472465929944_0013_m_000000_0, Status : FAILED
Error: Java heap space
16/08/29 07:07:37 INFO mapreduce.Job: Task Id : attempt_1472465929944_0013_m_000000_1, Status : FAILED
Error: Java heap space
16/08/29 07:07:59 INFO mapreduce.Job: Task Id : attempt_1472465929944_0013_m_000000_2, Status : FAILED
Error: Java heap space
16/08/29 07:08:21 INFO mapreduce.Job:  map 100% reduce 0%
16/08/29 07:08:23 INFO mapreduce.Job: Job job_1472465929944_0013 failed with state FAILED due to: Task failed task_1472465929944_0013_m_000000

Try with

sqoop import  -Dmapreduce.map.memory.mb=1024 -Dmapreduce.map.java.opts=-Xmx7200m -Dmapreduce.task.io.sort.mb=2400 --connect jdbc:mysql://local.ip/database_name --username root -P --hive-import --table table_name --as-parquetfile --warehouse-dir=/home/cloudera/hadoop --split-by 'id' -m 100

Initially, I have been using 10 mappers to process 10 million records and each chunk has a size of 1 million record. This was causing the error and as I fired 100 mapping jobs, it has processed the data successfully . The only thing I noticed is the time taken to complete the jobs. It has taken almost 1 hr to run all the 100 mapper jobs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM