简体   繁体   中英

AWS EMR on VPC with EC2 Instance

I am doing a reading on AWS EMR on VPC but it seems like it is more of design consideration for AWS EMR Service to access EMR cluster for calls.

What I am trying to do is host a VPC with ALB and EC2 instance running an application as a service to access EMR cluster.

VPC -> Internet Gateway -> Load Balancer -> EC2 (Application endpoints) -> EMR Cluster 

I don't want Cluster to be accessible from outside except through Public IP of IG. But Public IP can access only EC2 instance hosting application which calls EMR cluster on same VPC.

Is it recommended approach?

The design looks something like below. 在此处输入图片说明 Some challenges I am tackling is how to access S3 from EMR if on VPC, and if the application is running on EC2 can it access EMR cluster, and if EMR cluster would be available publicly?

Any guidance links or recommendations would be welcome.

EDIT:

Or if I create EMR on VPC do i need to wrap it inside of another VPC something like below?

在此处输入图片说明

The simplest design is:

  • Put everything in a public subnet in a VPC
  • Use Security Groups to control access to the EMR cluster

If you are security-paranoid, then you could use:

  • Put publicly-accessible resources (eg EC2) in a public subnet
  • Put EMR in a private subnet
  • Use a NAT Gateway or VPC-Endpoints to allow EMR to communicate with S3 (which is outside the VPC)

The first option is simpler and Security Groups act as firewalls that can fully protect the EMR cluster. You would create three security groups:

  • ELB-SG: Permit inbound access from the Internet on your desired ports. Associate the security group with your Load Balancer.
  • EC2-SG: Permit inbound access from ELB-SG (from the Security Group itself). Associate the security group with your EC2 instances.
  • EMR-SG: Permit inbound access from EC2-SG (from the Security Group itself). Associate EMR-SG with the EMR cluster.

This will permit only the Load Balancer to communicate with the EC2 instances and only the EC2 instances to communicate with the EMR cluster. The EMR cluster will be able to connect directly to the Internet to access Amazon S3 due to default rules permitting Outbound access.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM