繁体   English   中英

HiveQL查询没有结果,也没有错误

[英]HiveQL query returns no results and no errors

我在Ubuntu 14.0上运行Apache Hadoop 2.6.0,并且在Hive 0.13.0中创建了一个表,如下所示:

CREATE TABLE IF NOT EXISTS recipes_hive.cuisine (
ID INT COMMENT 'Cuisine ID.', 
name STRING COMMENT 'Cusine name - primary key.', 
area STRING COMMENT 'Name of the area of origin - foreign key.', 
scope STRING COMMENT 'Either country or area.') 
COMMENT 'Table containing cuisines data.'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

我用语句填充数据:

LOAD DATA LOCAL INPATH 'path_to_file/CUISINE.csv'
OVERWRITE INTO TABLE recipes_hive.cuisine;

我的数据库中有几个这样的表,它们都是用相同的过程创建并填充的。 在运行简单查询时,例如:

SELECT * FROM cuisine

甚至在WHERE子句中具有某些条件的情况下,我都能获得预期的结果,但运行更复杂的查询时会蹲下。 例如:

SELECT cuisine.name, SUM(IF (ingredient.category = "fruit",1,2))/count(*) AS      PERC 
FROM cuisine JOIN recipe ON recipe.cuisine = cuisine.name JOIN part_of ON part_of.id_recipe = recipe.id JOIN ingredient ON ingredient.name = part_of.ingredient 
GROUP BY cuisine.name 
ORDER BY PERC DESC

, 要么:

SELECT ingredient.id, ingredient.name 
FROM cuisine JOIN recipe ON recipe.cuisine = cuisine.name JOIN part_of ON part_of.id_recipe = recipe.id JOIN ingredient ON ingredient.name = part_of.ingredient 
WHERE ingredient.id IN (
SELECT ingredient.id 
FROM cuisine c JOIN recipe ON recipe.cuisine = c.name JOIN part_of ON part_of.id_recipe = recipe.id JOIN ingredient ON ingredient.name = part_of.ingredient 
WHERE c.name = "Pakistan") AND cuisine.name = "Bangladesh"

第一个示例计算一些百分比,第二个示例检查互斥元素。

正确调用了MapReduce和Hadoop,它们没有返回错误。 输出以以下结尾:

Execution completed successfully
MapredLocal task succeeded
OK
Time taken: 122.119 seconds

我已经检查了网络,人们也遇到了类似的问题。 我检查了:

Hive表在所有查询上返回空结果集

简单的Hive查询为空

但未能解决我的问题。 数据实际上位于HDFS中,并且如前所述,它可用于简单查询。

因此,我的Hive实例有问题或我的查询未正确编写。

任何帮助将不胜感激。 最好的祝福。

您确定结果联接将为非空。 因为,您已经实现了内部联接,所以即使一个表缺少记录,整个结果集也为0。尝试添加带有“ IS NULL”的左联接以验证所有表是否对结果集有所帮助。 如果所有子表在联接后各自的列中都具有非null值,则查询是好的。

如果我们有包含ID = {1,2,3}的Cuisine表和包含ID = {5,6,7}的Recipe表,那么即使这些表是非空的,当我们进行INNER JOIN时,我们仍然不会返回任何行Cuisine.ID = Recipe.ID(因为2个表中的ID不同),请检查是否没有这种条件。

SELECT count(1)
FROM cuisine c JOIN recipe ON recipe.cuisine = c.name WHERE c.name = "Pakistan";

--- must return > 0 

select count(1) from recipe as recipe
JOIN part_of ON part_of.id_recipe = recipe.id ;

--- must return > 0 

select count(1) from part_of as part_of
JOIN ingredient ON ingredient.name = part_of.ingredient ;

--- must return > 0 

因此,当所有count(*)非零时,内部查询将返回一行。 现在测试外部选择:

SELECT ingredient.id, ingredient.name 
FROM cuisine JOIN recipe ON recipe.cuisine = cuisine.name JOIN part_of ON part_of.id_recipe = recipe.id JOIN ingredient ON ingredient.name = part_of.ingredient 
WHERE ingredient.id = <inner query result> and cuisine.name = "Bangladesh";

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM