简体   繁体   English

Hive Map-Join 配置之谜

[英]Hive Map-Join configuration mystery

Could someone clearly explain what is the difference between有人可以清楚地解释两者之间的区别是什么

hive.auto.convert.join

and

hive.auto.convert.join.noconditionaltask

configuration parameters?配置参数?

Also these corresponding size parameters:还有这些对应的尺寸参数:

hive.mapjoin.smalltable.filesize

and

hive.auto.convert.join.noconditionaltask.size

My observation is when running on Tez, Map-Join works when hive.auto.convert.join.noconditionaltask.size is set to high enough value even when hive.mapjoin.smalltable.filesize is set less than the size of the small table.我的观察是在hive.auto.convert.join.noconditionaltask.size上运行时,即使hive.mapjoin.smalltable.filesize设置的小于小表的大小,当hive.auto.convert.join.noconditionaltask.size设置为足够高的值时,Map-Join 也能工作。

Why do we need both为什么我们两者都需要

hive.auto.convert.join and hive.auto.convert.join.noconditionaltask ? hive.auto.convert.joinhive.auto.convert.join.noconditionaltask

The Apache documentation is very confusing. Apache 文档非常混乱。

These parameters are used to make decision on when to use Map Join against Common join in hive, which ultimately affects query performance at the end.这些参数用于决定何时在 hive 中对Common join使用Map Join ,这最终会影响最后的查询性能。

Map join is used when one of the join tables is small enough to fit in the memory, so it is very fast.当连接表之一小到足以放入内存时,使用Map join ,因此速度非常快。 here's the explanation of all parameters:这里是所有参数的解释:

hive.auto.convert.join

When this parameter set to true , Hive will automatically check if the smaller table file size is bigger than the value specified by hive.mapjoin.smalltable.filesize , if it's larger than this value then query execute through common join.当此参数设置为true ,Hive 将自动检查较小的表文件大小是否大于hive.mapjoin.smalltable.filesize指定的值,如果大于此值,则通过 common join 执行查询。 Once auto convert join is enabled, there is no need to provide the map join hints in the query.启用自动转换连接后,无需在查询中提供地图连接提示。

hive.auto.convert.join.noconditionaltask

When three or more tables are involved in join, and当join涉及三个或更多表时,以及

hive.auto.convert.join = true - Hive generates three or more map-side joins with an assumption that all tables are of smaller size. hive.auto.convert.join = true - Hive 生成​​三个或更多 map-side 连接,并假设所有表的大小都较小。

hive.auto.convert.join.noconditionaltask = true , hive will combine three or more map-side joins into a single map-side join if size of n-1 table is less than 10 MB. hive.auto.convert.join.noconditionaltask = true ,如果 n-1 表的大小小于 10 MB,hive 会将三个或更多 map-side join 合并为一个 map-side join。 Here size is defined by hive.auto.convert.join.noconditionaltask.size .这里的大小由hive.auto.convert.join.noconditionaltask.size定义。

hive.mapjoin.smalltable.filesize

This setting basically the way to tell optimizer the definition of small table in your system.这个设置基本上是告诉优化器你系统中小表的定义的方式。 This value defines what is small table for you and then when query executes based on this value it determines if join is eligible to convert into map join .这个值定义什么是小表,然后当查询基于这个值执行时,它确定 join 是否有资格转换为map join

hive.auto.convert.join.noconditionaltask.size

The size configuration enables the user to control what size table can fit in memory.大小配置使用户能够控制什么大小的表可以放入内存。 This value represents the sum of the sizes of tables that can be converted to hashmaps that fit in memory.此值表示可以转换为适合内存的哈希图的表大小的总和。

Here's the very good explanation link which includes description for all 4 parameters with an example:这是非常好的解释链接,其中包括对所有 4 个参数的描述以及示例:

http://www.openkb.info/2016/01/difference-between-hivemapjoinsmalltabl.html http://www.openkb.info/2016/01/difference-between-hivemapjoinsmalltabl.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM