如何在自动缩放的（多实例）Elastic Beanstalk（Tomcat）应用程序（AWS）中配置数据文件？

Question

I currently have one Elastic Beanstalk instance running a Java application that is deployed to Tomcat. 我目前有一个Elastic Beanstalk实例，该实例运行一个已部署到Tomcat的Java应用程序。 I deploy the application using the Web interface but the application uses a data file (Lucene index) referenced in the web.xml that I copy to the underlying EC2 instance by ssh-ing to EC2 and getting the data file from my S3 bucket. 我使用Web界面部署应用程序，但是该应用程序使用web.xml中引用的数据文件（Lucene索引），该数据文件通过ssh-ing到EC2并从我的S3存储桶中获取而复制到基础EC2实例。

So far so good. 到目前为止，一切都很好。

But if I changed my EB to a autoscaleable environent so that it automatically creates new instances as required then these EC2 instances will not have the data file, how do I deal with this. 但是，如果我将EB更改为可自动缩放的环境，以便它根据需要自动创建新实例，则这些EC2实例将没有数据文件，我该如何处理。

Can I preconfigure each EC2 instance with datafile before it is actually used ?? 我可以在实际使用每个EC2实例之前使用数据文件对其进行预配置吗？
Can I have a shared fs that each server can refer to, (the datafiles are read only)? 我能否拥有每个服务器都可以引用的共享fs（数据文件是只读的）？

* Update * *更新*

I think Ive worked out the answer in principle. 我认为我已经原则上得出了答案。 I was uploading my application from my local machine then adding the large datafiles later from Amazon. 我是从本地计算机上载我的应用程序，然后稍后从Amazon添加大数据文件。 What I need to do is build my war on my dataprocessing EC2 instance, add the datafile to the war somewhere, then put this war onto S3, then when I create my EB I need to load the WAR from the S3 bucket. 我需要做的是在我的数据处理EC2实例上建立战争，将数据文件添加到战争中的某个地方，然后将该战争放置到S3上，然后在创建EB时，我需要从S3存储桶中加载WAR。

So just need to work out where data-file should go in War and how to create via Maven build process. 因此，只需要弄清楚数据文件在War中的位置以及如何通过Maven构建过程创建即可。

* Update 2 * *更新2 *

Actually its not clear that the data files should go in the WAR file after all, I cannot see where to put them and the application expects them to be real files so if contained within WAR and the WAR was not expanded/unjarred (I dont know what EB) does the application would not work anyway. 实际上，尚不清楚数据文件毕竟应该放在WAR文件中，我看不到将它们放在哪里，应用程序期望它们是真实文件，因此如果包含在WAR中并且WAR没有展开/取消压缩（我不知道EB）应用程序仍然无法正常工作。

* Update 3 * *更新3 *

I could certainly put the data in S3 (in fact it will probably will be there to start with) So I wonder if on server initlization I could get the s3 data and put it somewhere and then use it ? 我当然可以将数据放在S3中（实际上可能会从那里开始），所以我想知道是否可以在服务器初始化时获取s3数据并将其放在某个地方然后使用？ Guidance please. 请指导。

* Update 4 * *更新4 *

So using the s3 idea I nearly have it working, within the servlet init() method I get the compressed file, save it to the current working directory (/usr/share/tomcat7/) and then uncompress it. 因此，使用s3的想法，我几乎使它工作了，在servlet init（）方法中，我得到了压缩文件，将其保存到当前工作目录（/ usr / share / tomcat7 /）中，然后将其解压缩。 Trouble is the compressed file is 2.7GB,uncompressed folder it resolves to is 5GB , the minor instance used by EB offers 8GB of which 2GB is used. 麻烦的是压缩文件为2.7GB，解析为未压缩的文件夹为5GB，EB使用的次要实例提供8GB，其中2GB被使用。 So I have 6GB which is enough space for the uncompressed file, but not to save the compressed file and then uncompress it because I need 2.7 GB + 5 GB during the uncompressing process. 因此，我有6GB的空间足以容纳未压缩的文件，但无法保存压缩的文件然后再将其解压缩，因为在解压缩过程中需要2.7 GB + 5 GB。

I loaded the compressed version to S3 because the original data is not a single file but a folder full of files it would be difficult to manage as a list of files. 我将压缩版本加载到S3，因为原始数据不是单个文件，而是一个充满文件的文件夹，很难将其作为文件列表进行管理。 I cannot change the size of root dir in EB, I could try changing to a powerful instance but that will unnessarily be more expensive and not clear what disk space is provided with instance used by ECB. 我无法在EB中更改root dir的大小，我可以尝试更改为功能强大的实例，但这将不必要地增加成本，并且不清楚ECB使用的实例提供了哪些磁盘空间。 Any ideas ? 有任何想法吗？

These were the dependencies I added to my maven repo 这些是我添加到Maven存储库中的依赖项

  <dependency>
        <groupId>com.amazonaws</groupId>
        <artifactId>aws-java-sdk</artifactId>
        <version>1.8.2</version>
    </dependency>
    <dependency>
        <groupId>org.rauschig</groupId>
        <artifactId>jarchivelib</artifactId>
        <version>0.6.0</version>
    </dependency>

And this is the code 这是代码

@Override
public void init()
{
        try
        {
            log.severe("Retrieving Indexes from S3");
            AWSCredentials credentials      = new BasicAWSCredentials("***********", "***********");
            AmazonS3Client ac = new AmazonS3Client(credentials);

            log.severe("datalength-testfile:"+ac.getObjectMetadata("widget","test.txt").getContentLength());
            File testFile = new File("test.txt");
            ac.getObject(new GetObjectRequest("widget", "test.txt"), testFile);
            log.severe("datalength-testfile:retrieved");

            log.severe("datalength-largefile:"+ac.getObjectMetadata("widget","indexes.tar.gz").getContentLength());
            File largeFile = new File("indexes.tar.gz");
            ac.getObject(new GetObjectRequest("widget", "indexes.tar.gz"), largeFile);
            log.severe("datalength-largefile:retrieved");
            log.severe("Retrieved Indexes from S3");

            log.severe("Unzipping Indexes");
            File indexDirFile = new File(indexDir).getAbsoluteFile();
            indexDirFile.mkdirs();
            Archiver archiver = ArchiverFactory.createArchiver(largeFile);
            archiver.extract(largeFile, indexDirFile);
            log.severe("Unzipped Indexes");


        }
        catch(Exception e)
        {
            log.log(Level.SEVERE, e.getMessage(), e );
        }
}

* Update 5 * *更新5 *

Having realized the micro EC2 instance only provide 0.6GB not 6GB i needed to update to a larger machine anyway and that provided two disks so I could copy compressed file to one disk and then uncompress to root disk successfully, so ready to go. 意识到micro EC2实例只能提供0.6GB而不是6GB的内存，我无论如何都需要更新到一台更大的计算机，它提供了两个磁盘，因此我可以将压缩文件复制到一个磁盘，然后成功地解压缩到根磁盘，因此可以开始使用了。

* Update 6 * *更新6 *

EB does not respect init() method so in autoscaled EB configuration it starts up other EC2 instances believing the 1st one to be overloaded when in fact it is just getting ready. EB不尊重init（）方法，因此在自动缩放的EB配置中，它启动了其他EC2实例，并认为第一个实例实际上已经准备好时已过载。 And I suspect if it starts new ones when genuinely busy the load balancer will start feeding requests to these instances before they are ready causing failed requests. 而且我怀疑在真正繁忙的情况下，负载均衡器是否会启动新的请求，因此负载平衡器会在准备好导致失败的请求之前开始向这些实例提供请求。

* Update 7 * *更新7 *

Tried putting indexes directly into WEB-INF/classes and referring to that location in web.xml. 尝试将索引直接放入WEB-INF /类中，并在web.xml中引用该位置。 This works on a local test Tomcat deployment but unfortunately fails in EB because complains So it seems EB doesnt respoect init(). 这适用于本地测试Tomcat部署，但不幸的是EB在EB失败，原因是抱怨EB似乎没有重新初始化init（）。 So instead of trying to get the indexes from S3 within the init() method I just put the indexes directly into the War file under WEB-INF/classes and point the paramter in my web.xml to there. 因此，与其尝试在init（）方法中从S3获取索引，不如将索引直接放入WEB-INF / classes下的War文件中，并将我的web.xml中的参数指向那里。 Although they are not actually classes this does not cause a problem for Tomcat and I have tested against deployment against a local tomcat installation without problem. 尽管它们实际上不是类，但这对Tomcat不会造成问题，并且我已经针对本地tomcat安装进行了部署测试，没有问题。

Unfortunately having uploaded this larger war file containign the indexes to S3 attempt to deploy it to EB from S3 location fails with: 不幸的是，将这个较大的war文件包含到S3的索引上载后，尝试将其从S3位置部署到EB失败，并且失败：

Could not launch environment: Source bundle is empty or exceeds maximum allowed size: 524288000. 无法启动环境：源包为空或超过最大允许大小：524288000。

Why have Amazon imposed this arbitary limit ? 亚马逊为什么要施加这种人为的限制？

* Update 8 * *更新8 *

So possible options are 所以可能的选择是

ebextensions 伸展
Docker deployment Docker部署
Create custom Amazon image for use with EB 创建用于EB的自定义Amazon图像

3rd option seems very hacky, not all keen on that, or very keen on the others really. 第三种选择似乎很骇人，不是所有人都热衷于此，或者不是真的很热衷于其他选项。

* Update 9 ** *更新9 **

I got it working with ebextensions in the end, wasnt too bad, I document here in case useful 最终我将其与ebextensions一起使用，还不错，我在这里记录以防万一

If using maven create folder ebextensions in src/main/resources Add the following to pom.xml (sao that ebextensions goes in the right place in final war) 如果使用maven在src / main / resources中创建文件夹ebextensions，则将以下内容添加到pom.xml中（在最终战争中，ebextensions放在正确的位置）

            <plugin>
                <artifactId>maven-war-plugin</artifactId>
                <configuration>
                    <webResources>
                        <resource>
                            <directory>src/main/ebextensions</directory>
                            <targetPath>.ebextensions</targetPath>
                            <filtering>true</filtering>
                        </resource>
                    </webResources>
                </configuration>
            </plugin>

Create .config file in ebextensions folder ( I called mine copyindex.cfg) and mine had this information 在ebextensions文件夹中创建.config文件（我称为mine copyindex.cfg），而我的数据库具有此信息

commands:
   01_install_cli:
    command: wget https://s3.amazonaws.com/aws-cli/awscli-bundle.zip; unzip awscli-bundle.zip;  ./awscli-bundle/install -b ~/bin/aws

   02_get_index:
     command:
       aws s3 cp --region eu-west-1 s3://jthink/release_index.tar.gz /dev/shm/release_index.tar.gz;
       cd /usr/share/tomcat7; tar -xvf /dev/shm/release_index.tar.gz

Go to IAM console ( https://console.aws.amazon.com/iam/home?#home ) and attach role policy Power User to Elastic Beanstalk Role user 转到IAM控制台（ https://console.aws.amazon.com/iam/home?#home ），然后将角色策略超级用户附加到Elastic Beanstalk角色用户

Deploy your application 部署您的应用

Answer 1

There are multiple ways of achieving this. 有多种方法可以实现这一目标。 You do not need to ssh to the instance and copy your files. 您无需ssh到实例并复制文件。

I would recommend the approach in your "Update 3". 我会在“更新3”中推荐该方法。

You can configure your Elastic Beanstalk environment to execute commands before deploying the application. 您可以将Elastic Beanstalk环境配置为在部署应用程序之前执行命令。 You can do this using ebextensions. 您可以使用ebextensions进行此操作。 Read the documentation on commands here . 在此处阅读有关命令的文档。

Essentially you create a folder with the name .ebextensions in your app source. 本质上，您在应用程序源中创建一个名为.ebextensions的文件夹。 This folder can contain one or more files with .config extension. 该文件夹可以包含一个或多个扩展名为.config文件。 These files are processed in lexicographical order of their name. 这些文件按照其名称的字典顺序进行处理。 You can execute shell commands by using ebextensions. 您可以使用ebextensions执行shell命令。 For example you can do the following: 例如，您可以执行以下操作：

commands:
  02_download_index: 
    command: aws s3 cp s3://mybucket/test.txt test2.txt

You will need to install aws cli on your EC2 instances first. 您首先需要在EC2实例上安装aws cli 。 This can again be done with a command similar to above. 可以再次使用类似于上面的命令来完成此操作。 Instructions on how to install AWS CLI using the bundled installer are available here . 此处提供了有关如何使用捆绑的安装程序安装AWS CLI的说明。 You can run more than one command. 您可以运行多个命令。 The commands within a config file will be executed in lexicographical order so you can name your commands like 01_install_awcli , 02_download_index etc. 配置文件中的命令将按字典顺序执行，因此您可以命名命令，例如01_install_awcli ， 02_download_index等。

Now if you plan to use AWS CLI on the EC2 instance, you will also need credentials. 现在，如果您打算在EC2实例上使用AWS CLI，则还需要凭证。 If you are using an IAM Instance Profile (most likely you are, if not read about it here ). 如果您使用的是IAM实例配置文件（很可能是您，如果没有在此阅读有关内容）。 You can give your instance profile permissions to access your S3 object using IAM. 您可以授予实例配置文件权限，以使用IAM访问S3对象。 That way your instances will have an IAM instance profile associated with it and will be able to download the file from S3. 这样，您的实例将具有与之关联的IAM实例配置文件，并能够从S3下载文件。 Alternatively you can also directly get the ACCESS_KEY_ID and SECRET_KEY using environment properties as shown here . 另外，您也可以直接拿到ACCESS_KEY_ID和使用环境属性SECRET_KEY如图所示这里。

All new instances that come up should execute the commands in your ebextensions. 出现的所有新实例都应在扩展名中执行命令。 Thus your instances can be preconfigured with the software that you want. 因此，您的实例可以使用所需的软件进行预配置。

如何在自动缩放的（多实例）Elastic Beanstalk（Tomcat）应用程序（AWS）中配置数据文件？

问题描述

1 个解决方案

解决方案1
4 已采纳 2014-06-30 04:35:29

如何在自动缩放的（多实例）Elastic Beanstalk（Tomcat）应用程序（AWS）中配置数据文件？

问题描述

1 个解决方案

解决方案1 4 已采纳 2014-06-30 04:35:29

解决方案1
4 已采纳 2014-06-30 04:35:29