简体   繁体   中英

How does the DBInputFormat work in case of MYSQL?

当在像MYSQL这样的数据库上运行map reduce程序时,我只是想知道是否首先在数据库上触发查询,然后获取结果集,然后创建拆分以由每个进行拆分的单独的映射器操作。

I believe it first retrieves all the records and then create the logical splits as you may see from the setInput() 's signature:

public static void setInput(JobConf job,
                            Class<? extends DBWritable> inputClass,
                            String inputQuery,
                            String inputCountQuery)

It gets the inputCountQuery which makes hadoop decide on the number of mappers and how many records per mapper to process.

Also read the Limitations of the InputFormat section here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM