As shown in the digram,the pet-project that I am working on has two following components.
a) The "RestAPI layer" (set of micro-services)
b) "Scalable Parallelized Algorithm" component.
I am planing on running this on AWS.I realized that I can use ElasticBeanTalk to deploy my RestAPI module.(Spring Boot JAR with embedded tomcat)
I am thinking how to architect the "Scalable Parallelized Algorithm" component.Here are some design details about this:
My questions:
1) Shall I use EC2 to deploy "Nodes" or can I use ElasticBeanStalk to deploy these nodes as well.I know with EC2 I can manage the number of nodes depend on the size of S3 data, but is it possible to do this with ElasticBeanStalk?
2) Can I use
Inet4Address.getLocalHost().getHostAddress()
to get the IP of the each Node ? Do EC2 instances have more than one IP ? This IP should be allow the RestAPI Layer to communicate with the "master" Node.
3) Whats the component I should use expose my RestAPI layer to the external world ? But I dont want to expose my "Nodes".
Update : I cant use MapReduce since the nodes have state. ie, During initialization , each Node read its chunk of data from S3 and create the "vector space" in memory.This a time consuming process , so thats why this should be stored in memory.Also this system need near-real-time response , cannot use a "batch" system like MR.
https://aws.amazon.com/cloudformation/faqs/
curl http://169.254.169.254/latest/meta-data/public-ipv4
or
curl http://169.254.169.254/latest/meta-data/local-ipv4
Full reference to EC2 instance metadata:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html
Some general principles
ElasticBeanstalk VS "manual" setup
ElasticBeanstalk sounds like a good choice to me, but it's important to see, it's using the same components which I would recommend:
To be clear, ElasticBeanstalk does something similar. Actually if you create a multi node Beanstalk stack, it will run a CloudFromation template, create an ELB, an ASG, a LCFG, and Instances. You just have a bit less control but also less management overhead.
If you go with Beanstalk, you need Worker Environment which also creates the SQS queue for you. If you go for a Worker Environment, you can find tutorials, working examples, which makes your start easier.
Further to read: Background Task Handling for AWS Elastic Beanstalk Architectural Overview
2) You can use CLI, it has some filtering capabilities, or you can use other commands like jq for filtering/formatting the output. Here is a similar example . Note: Use tags and then you can easily filter the instances. Or you can query based on the ELB/ASG.
3) Exposing your API via the API Gateway sounds a good solution. I assume you want to expose only the Master node(s) since thats what managing the tasks.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.