简体   繁体   English

MapReduce Job继续运行map = 0%,reduce = 0%for hours

[英]MapReduce Job continues to run with map = 0%, reduce = 0% for hours

I am running one Hive query which looks like 我正在运行一个看起来像Hive的查询

create table table1 as select split(comments,' ') as words from table2;

comments column has review comments in the form of Strings separated by space. 注释列具有以空格分隔的字符串形式的评论注释。

When I run this query, MapReduce job starts and continues to run with Map 0% for hours. 当我运行此查询时,MapReduce作业启动并继续以Map 0%运行数小时。 It does not give any error during this process. 在此过程中不会出现任何错误。

hive> create table jw_1 as select split(comments,' ') from removed_null_values;
Query ID = xxx-190418201314_7781cf59-6afb-4e82-ab75-c7e343c4985e
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1555607912038_0013, Tracking URL = http://xxx-VirtualBox:8088/proxy/application_1555607912038_0013/
Kill Command = /usr/local/bin/hadoop-3.2.0/bin/mapred job  -kill job_1555607912038_0013
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-04-18 20:13:30,568 Stage-1 map = 0%,  reduce = 0%
2019-04-18 20:14:31,140 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 39.6 sec
2019-04-18 20:15:31,311 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 101.64 sec
2019-04-18 20:16:31,451 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 146.5 sec
2019-04-18 20:17:31,684 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 212.08 sec

However when I try 但是,当我尝试

select split(comments,' ') from table2;

I can see comments in the form of an array in the shell. 我可以在shell中看到数组形式的注释。

["\"Lauren","was","promptly","responsive","in","advance","of","our","booking.","providing","a","lot","of","helpful","info.","And","she","stayed","in","contact","and","was","readily","available","prior","to","and","during","our","stay.","which","was","awesome.","The","location.","price","and","privacy","were","the","real","benefits."]

I have also run a few other queries where the MapReduce jobs complete and produce the desired result 我还运行了一些其他查询,其中MapReduce作业完成并产生所需的结果

I am currently using Hive 3.1.1 我目前正在使用Hive 3.1.1

Basically, I want to create a new table with an array containing words and later on tokenize that column 基本上,我想创建一个新表,其中包含一个包含单词的数组,稍后会对该列进行标记化

I am new to Hive and I am working on sentimental analysis on data file of size 35MB. 我是Hive的新手,我正在对35MB大小的数据文件进行情感分析。

In your first case, you most likely don't have the resources necessary to complete the Hive query when converted to MapReduce. 在第一种情况下,转换为MapReduce时,很可能没有完成Hive查询所需的资源。 You would have to look at either YARN or MR1 to determine if you have enough compute resources to run your MapReduce job. 您必须查看YARN或MR1以确定您是否有足够的计算资源来运行MapReduce作业。

I the second query, some Hive queries trigger don't trigger MapReduce jobs and that is why it comes back. 我是第二个查询,一些Hive查询触发器不会触发MapReduce作业,这就是它回来的原因。 See How does Hive decide when to use map reduce and when not to? 请参阅Hive如何决定何时使用map reduce以及何时不使用? for more information. 欲获得更多信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM