I am going through the online documentation and I found following diference between core not and task node.
due to above, AWS suggest it's not a good idea to scale core nodes based on load as hdfs re-balancing could take time and should re-balance task nodes only.
However, if I am planning to use EMRFS, do i need core nodes? what is the user of HDFS in this case if I am planning to access data from s3.
You need at least 1 Core Node.
If you want to use s3distcp after finish a job writing to local HDFS, then you need more such Core Nodes.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.