当在像MYSQL这样的数据库上运行map reduce程序时,我只是想知道是否首先在数据库上触发查询,然后获取结果集,然后创建拆分以由每个进行拆分的单独的映射器操作。
I believe it first retrieves all the records and then create the logical splits as you may see from the setInput()
's signature:
public static void setInput(JobConf job,
Class<? extends DBWritable> inputClass,
String inputQuery,
String inputCountQuery)
It gets the inputCountQuery
which makes hadoop decide on the number of mappers and how many records per mapper to process.
Also read the Limitations of the InputFormat section here .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.