简体繁体 English

在MySQL Docker容器中包含数据

[英]Including data in MySQL Docker container

原文 2016-10-06 14:24:42 0 2 mysql/ docker

This question is similar to: 这个问题类似于：

Setting up MySQL and importing dump within Dockerfile 在Dockerfile中设置MySQL并导入转储

But the answer to that question did not address my use case. 但是这个问题的答案并没有解决我的用例。

I have a MySQL database that has 5TB of data in production. 我有一个MySQL数据库，生产中有5TB的数据。 For development, I only need about 500MB of that data. 对于开发，我只需要大约500MB的数据。 The integration tests that run as part of the build of my application require access to a MySQL database. 作为构建应用程序的一部分运行的集成测试需要访问MySQL数据库。 Currently, that database is being created on Jenkins and data is being injected into it by the build process. 目前，正在Jenkins上创建该数据库，并且构建过程正在将数据注入其中。 This is very slow. 这很慢。

I would like to replace this part of this process with Docker. 我想用Docker替换这个过程的这一部分。 My idea is that I would have a Docker container that runs MySQL and that has my 500MB of data already baked into the container, rather than relying on the standard process associated with the MySQL Docker image of only executing the MySQL import when the container launches. 我的想法是，我将拥有一个运行MySQL的Docker容器，并且已经将500MB数据放入容器中，而不是依赖于与容器启动时仅执行MySQL导入的MySQL Docker映像关联的标准进程。 Based on tests to date, the standard process is taking 4 to 5 minutes, where as I would like to get this down to seconds. 根据迄今为止的测试，标准过程需要4到5分钟，我希望将其缩短到几秒钟。

I would have thought this would be a common use case, but pre-baking data in MySQL Docker containers seems to be frowned upon, and there isn't really any guidance regarding this method. 我原本以为这是一个常见的用例，但MySQL Docker容器中的预烘焙数据似乎不受欢迎，并且对此方法没有任何指导。

Has anyone any experience in this regard? 有没有人在这方面有任何经验？ Is there a very good reason why data should not be pre-baked into a MySQL Docker container? 有没有一个很好的理由为什么不应该将数据预先烘焙到MySQL Docker容器中？

2 个解决方案

Based on investigation I have done with this, it isn't really possible to include data in a container that uses the standard MySQL image as its base. 根据我对此进行的调查，实际上不可能将数据包含在使用标准MySQL映像作为其基础的容器中。

I tried to get around this by deploying a container from this base and manipulating it, before then doing a commit to a new image. 我尝试通过从此基础部署容器并对其进行操作来解决此问题，然后再提交新映像。

However, there is a key thing to understand about the MySQL base image. 但是，有一个关键的事情需要了解MySQL基础映像。 Both its data directory (/var/lib/mysql/) and config directory (/etc/mysql/) are set up as Docker volumes, which means their contents map to locations on your host system. 它的数据目录（/ var / lib / mysql /）和config目录（/ etc / mysql /）都设置为Docker卷，这意味着它们的内容映射到主机系统上的位置。

Volumes like these aren't saved as part of commits, so you can't manipulate and save. 像这样的卷不会保存为提交的一部分，因此您无法操作和保存。 In addition, the image has features that prevent manipulation of these locations with ENTRYPOINT routines. 此外，该图像具有阻止使用ENTRYPOINT例程操纵这些位置的功能。

All of this is by design, as it is envisaged that this image be used with either persistent or independent data sets. 所有这些都是设计的，因为设想该图像与持久或独立的数据集一起使用。 It would be nice if there were an option to include the data in the container, but this looks like something the developers really do not want to entertain. 如果有一个选项可以在容器中包含数据会很好，但这看起来像开发人员真的不想娱乐。

To resolve my issue, I want back to a base Ubuntu image, built my DB on it, and committed that to a new image, which works fine. 为了解决我的问题，我想回到基础Ubuntu映像，在其上构建我的数据库，并将其提交到新映像，这可以正常工作。 The container size is a bit larger, but the deployment as part of our build job is significantly faster than waiting for the MySQL-based container to run the 500MB import at start up. 容器大小稍微大一些，但作为构建作业的一部分的部署明显比等待基于MySQL的容器在启动时运行500MB导入要快得多。

The main argument against this is that your image is a snapshot of the data and the schema at a point in time - it will get stale quickly, and you'll need a good process to generate new images with fresh data easily, to make it useful without being expensive to maintain. 反对这一点的主要论点是，您的图像是某个时间点的数据和模式的快照 - 它会很快变得陈旧，并且您需要一个良好的过程来轻松生成包含新数据的新图像，以使其成为现实有用而没有昂贵的维护。

That said, I wouldn't frown upon this - I think it's a particularly good use-case for a non-production Docker image. 也就是说，我不会对此皱眉 - 我认为这对于非生产Docker镜像来说是一个特别好的用例。 A 500MB image is pretty cheap to move around, so you could have lots of them - tagged versions for different releases of your database schema, and even multiple images with different datasets for different test scenarios. 移动500MB图像非常便宜，因此您可以拥有大量图像 - 针对数据库模式的不同版本的标记版本，甚至包含针对不同测试场景的不同数据集的多个图像。

A pre-loaded database container should start in seconds, so you can easily run the relevant container as a step in your build pipeline before running integration tests. 预加载的数据库容器应该在几秒钟内启动，因此您可以在运行集成测试之前轻松地将相关容器作为构建管道中的一个步骤运行。 Just be aware of the maintenance overhead - I would look at automating the data extract from live, cleansing, shrinking and packaging right from the start. 请注意维护开销 - 我会从一开始就考虑从实时，清理，收缩和打包中自动化数据提取。