[英]Hive Map-Join configuration mystery
Could someone clearly explain what is the difference between有人可以清楚地解释两者之间的区别是什么
hive.auto.convert.join
and和
hive.auto.convert.join.noconditionaltask
configuration parameters?配置参数?
Also these corresponding size parameters:还有这些对应的尺寸参数:
hive.mapjoin.smalltable.filesize
and和
hive.auto.convert.join.noconditionaltask.size
My observation is when running on Tez, Map-Join works when hive.auto.convert.join.noconditionaltask.size
is set to high enough value even when hive.mapjoin.smalltable.filesize
is set less than the size of the small table.我的观察是在hive.auto.convert.join.noconditionaltask.size
上运行时,即使hive.mapjoin.smalltable.filesize
设置的小于小表的大小,当hive.auto.convert.join.noconditionaltask.size
设置为足够高的值时,Map-Join 也能工作。
Why do we need both为什么我们两者都需要
hive.auto.convert.join
and hive.auto.convert.join.noconditionaltask
? hive.auto.convert.join
和hive.auto.convert.join.noconditionaltask
?
The Apache documentation is very confusing. Apache 文档非常混乱。
These parameters are used to make decision on when to use Map Join
against Common join
in hive, which ultimately affects query performance at the end.这些参数用于决定何时在 hive 中对Common join
使用Map Join
,这最终会影响最后的查询性能。
Map join
is used when one of the join tables is small enough to fit in the memory, so it is very fast.当连接表之一小到足以放入内存时,使用Map join
,因此速度非常快。 here's the explanation of all parameters:这里是所有参数的解释:
hive.auto.convert.join
When this parameter set to true
, Hive will automatically check if the smaller table file size is bigger than the value specified by hive.mapjoin.smalltable.filesize
, if it's larger than this value then query execute through common join.当此参数设置为true
,Hive 将自动检查较小的表文件大小是否大于hive.mapjoin.smalltable.filesize
指定的值,如果大于此值,则通过 common join 执行查询。 Once auto convert join is enabled, there is no need to provide the map join hints in the query.启用自动转换连接后,无需在查询中提供地图连接提示。
hive.auto.convert.join.noconditionaltask
When three or more tables are involved in join, and当join涉及三个或更多表时,以及
hive.auto.convert.join = true
- Hive generates three or more map-side joins with an assumption that all tables are of smaller size. hive.auto.convert.join = true
- Hive 生成三个或更多 map-side 连接,并假设所有表的大小都较小。
hive.auto.convert.join.noconditionaltask = true
, hive will combine three or more map-side joins into a single map-side join if size of n-1 table is less than 10 MB. hive.auto.convert.join.noconditionaltask = true
,如果 n-1 表的大小小于 10 MB,hive 会将三个或更多 map-side join 合并为一个 map-side join。 Here size is defined by hive.auto.convert.join.noconditionaltask.size
.这里的大小由hive.auto.convert.join.noconditionaltask.size
定义。
hive.mapjoin.smalltable.filesize
This setting basically the way to tell optimizer the definition of small table in your system.这个设置基本上是告诉优化器你系统中小表的定义的方式。 This value defines what is small table for you and then when query executes based on this value it determines if join is eligible to convert into map join
.这个值定义什么是小表,然后当查询基于这个值执行时,它确定 join 是否有资格转换为map join
。
hive.auto.convert.join.noconditionaltask.size
The size configuration enables the user to control what size table can fit in memory.大小配置使用户能够控制什么大小的表可以放入内存。 This value represents the sum of the sizes of tables that can be converted to hashmaps that fit in memory.此值表示可以转换为适合内存的哈希图的表大小的总和。
Here's the very good explanation link which includes description for all 4 parameters with an example:这是非常好的解释链接,其中包括对所有 4 个参数的描述以及示例:
http://www.openkb.info/2016/01/difference-between-hivemapjoinsmalltabl.html http://www.openkb.info/2016/01/difference-between-hivemapjoinsmalltabl.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.