简体繁体 English

hadoop兽人表一直只使用一个映射器

[英]hadoop orc table taking only one mapper all the time

原文 2016-01-26 15:28:28 0 1 sql/ hadoop/ hive/ orc/ bigdata

In my current project I am working with Orc files with snappy compression format ,What ever query I run it is running with only one mapper .I tried to configure the mapred.max.split.size and mapred.min.split.size,but is not showing any changes in the number of mappers.The reducer count is good enough ,but as the mapper is a single mapper,The time to run a simple query like . 在我当前的项目中，我正在使用具有快速压缩格式的Orc文件，无论运行什么查询，它都仅使用一个mapper运行。我试图配置mapred.max.split.size和mapred.min.split.size，但是并没有显示映射器数量的任何变化。reduce数量足够好，但是由于映射器是单个映射器，因此可以运行一个简单的查询，例如。

select x,max(y) from z group by x ; 从x的z组中选择x，max（y）; is taking almost 20 mins to complete the mapper . 完成映射器大约需要20分钟。 Is there any other things I should do to increase the number of mappers. 还有其他我应该做的事情来增加映射器的数量。

Please don't tell that to use the partitions or buckets ,As I have used them already in my table. 请不要告诉我要使用分区或存储桶，因为我已经在表中使用了它们。

1 个解决方案

Try to play with tblproperties orc.stripe.size. 尝试使用tblproperties orc.stripe.size。

The default value for stripe size is 256 MB and technically there is one mapper per one stripe. 条带大小的默认值为256 MB，从技术上讲，每条带有一个映射器。 With decreasing size of single stripe you can increase number of mappers. 随着单个条带大小的减少，您可以增加映射器的数量。

按同一张桌子分组需要很长时间 - Group By one same table taking long time

从表中选择 * 但只取具有最高值的一行 - SELECTING * From a Table But Only Taking One Row with Highest Value

具有两个FK的SQLite表，但一次仅一个 - SQLite table with two FKs, but only one at a time

sql仅为一个表选择all - sql select all for one table only

SQL 3表联接同时从1个表中获取所有值，但仅从其他2个表中填充 - SQL 3 table Join While taking all values from 1 table but only filled from other 2

如何在没有关系的情况下连接两个表，只从一个表中获取一个值并将其粘贴到另一个表中？ - How to join two tables without a relationship, taking only one value from a table and pasting it to the other table?

hadoop操作只能写一行吗？ - hadoop operation only writing one row?

SQL查询仅一张表-花费太多时间 - SQL Query for only one table - takes too much time

查询以从一张表中获取所有员工的进出时间 - Query to fetch in-out time for all employees from one table

需要较长时间的 SQL 表比较 - SQL Table Comparison Taking Extended Periods of Time

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 按同一张桌子分组需要很长时间 - Group By one same table taking long time 从表中选择 * 但只取具有最高值的一行 - SELECTING * From a Table But Only Taking One Row with Highest Value 具有两个FK的SQLite表，但一次仅一个 - SQLite table with two FKs, but only one at a time sql仅为一个表选择all - sql select all for one table only SQL 3表联接同时从1个表中获取所有值，但仅从其他2个表中填充 - SQL 3 table Join While taking all values from 1 table but only filled from other 2 如何在没有关系的情况下连接两个表，只从一个表中获取一个值并将其粘贴到另一个表中？ - How to join two tables without a relationship, taking only one value from a table and pasting it to the other table? hadoop操作只能写一行吗？ - hadoop operation only writing one row? SQL查询仅一张表-花费太多时间 - SQL Query for only one table - takes too much time 查询以从一张表中获取所有员工的进出时间 - Query to fetch in-out time for all employees from one table 需要较长时间的 SQL 表比较 - SQL Table Comparison Taking Extended Periods of Time

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM