简体   繁体   中英

AWS EMR: do i need core nodes if I am planning to use EMRFS

I am going through the online documentation and I found following diference between core not and task node.

  1. Core node has hdfs while task node does not have HDFS.

due to above, AWS suggest it's not a good idea to scale core nodes based on load as hdfs re-balancing could take time and should re-balance task nodes only.

However, if I am planning to use EMRFS, do i need core nodes? what is the user of HDFS in this case if I am planning to access data from s3.

You need at least 1 Core Node.

If you want to use s3distcp after finish a job writing to local HDFS, then you need more such Core Nodes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM