简体繁体 English

用于将大型文件从多个端点上传到云存储的体系结构

[英]Architecture for uploading large files from many end points to the cloud storage

原文 2019-04-30 13:24:53 8 1 file-upload/ server/ cloud-storage

I am working on a desktop app that offers uploading to the cloud storage. 我正在开发一个桌面应用程序，该应用程序可以上传到云存储。 Storage providers have an easy way to upload files. 存储提供商提供了一种上传文件的简便方法。 You get accessKeyId and secretAccessKey and you are ready to upload. 您将获得accessKeyId和secretAccessKey并准备上载。 I am trying to come up with optimal way for upload files. 我正在尝试提出上传文件的最佳方法。

Option 1. Pack each app instance with access keys. 选项1.用访问密钥打包每个应用程序实例。 This way files can be uploaded directly to cloud without the middle man. 这样，无需中间人就可以将文件直接上传到云。 Unfortunately, I cannot execute any logic before uploading to the cloud. 不幸的是，在上传到云之前，我无法执行任何逻辑。 For example.. if each users has 5GB of storage available, I cannot verify this constraint right at storage provider. 例如..如果每个用户都有5GB的可用存储空间，我将无法在存储提供者处验证此约束。 I haven't found any provider that does that. 我还没有找到执行此操作的任何提供程序。 I might send a request to my own server before upload to make verification, but since keys are hardcoded in app and I am sure this is an easy exploit. 在上载之前，我可能会向自己的服务器发送请求以进行验证，但是由于密钥在应用程序中进行了硬编码，因此我确信这是一个简单的利用方法。

Option 2. Send each uploaded file to a server, where constraint logic can be executed and forward the file to the final cloud storage. 选项2.将每个上载的文件发送到服务器，在服务器上可以执行约束逻辑，然后将文件转发到最终的云存储。 This approach suffers from bottleneck at the server. 这种方法遭受服务器瓶颈的困扰。 For example, if 100 users start uploading(or downloading) 1 GB file and if the server has bandwidth speed 1000Mb/s, than each user uploads at only 10Mb/s = 1.25MB/s. 例如，如果100个用户开始上载（或下载）1 GB文件，并且服务器的带宽速度为1000Mb / s，则每个用户仅以10Mb / s的速度上载= 1.25MB / s。

Option 2 seems to be the way to go, because I get control over who can upload and keys aren't shared publicly. 选项2似乎是可行的方法，因为我可以控制谁可以上传并且密钥不会公开共享。 I am looking for tips to minimise bandwidth bottleneck. 我正在寻找使带宽瓶颈最小化的技巧。 What approach is recommended to handle simultaneous uploading of large files to the cloud storage? 建议使用哪种方法来将大文件同时上传到云存储？ I am thinking of deploying many low-cpu and low-memory instances and use streaming instead of buffering the whole file first and sending it after. 我正在考虑部署许多低CPU和低内存实例，并使用流传输而不是先缓冲整个文件然后再发送。

1 个解决方案

I believe asking for architecture validation and improvement is out of scope of this forum, but I'll bite. 我相信要求架构验证和改进不在本论坛的讨论范围之内，但我会咬一口。 Also, some aspects are not clear. 另外，某些方面还不清楚。 I assume you mean you'll upload files to something like S3, but you'll limit how much users can upload based on how much they are paying. 我假设您的意思是您将文件上传到S3之类的东西，但是您将根据他们支付的费用来限制用户可以上传的数量。

You can go with Option 1. Upload directly to storage provider, but validate with your server first. 您可以使用选项1。直接上载到存储提供程序，但首先使用服务器进行验证。 You'll need to be able to: 您需要能够：

Identify each user. 识别每个用户。 A simple UUID might do the trick, or go full user/pass. 一个简单的UUID可能会成功，或者获得完整的用户权限。
Have a database that keeps track of each client's usage. 有一个数据库，可以跟踪每个客户端的使用情况。
Encrypt communication between desktop app and your server with your own private key. 使用自己的私钥来加密桌面应用程序和服务器之间的通信。 That is, in addition to HTTPS. 也就是说，除了HTTPS。 If you're not clear on how public-key cryptography works, you should look it up. 如果您不清楚公钥密码的工作方式，则应进行查找。
Use temporary access keys for each provider and find a way to deal with that. 为每个提供程序使用临时访问密钥，并找到一种解决方法。

These will increase your cost. 这些会增加您的成本。 Not as much as Option 2 will though. 虽然没有选项2那样多。

You app will make an API call to your server before uploading in order to determine if the upload is valid. 您的应用将在上传之前对您的服务器进行API调用，以确定上传是否有效。 Any answer (or lack of one) that is not a good answer means the upload fails. 任何不是好答案的答案（或缺少答案）都意味着上传失败。 That also means you're introducing a single point of failure in your architecture and you better make sure your server is always up and available as long as you still have users, otherwise you'll be in breach of Wheaton's Law. 这也意味着您要在体系结构中引入单点故障，并且最好确保只要您仍然有用户，服务器就始终可以正常运行并且可用，否则将违反惠顿定律。 My advice, go serverless here. 我的建议，在这里无服务器。

You will use temporary access_key/secret_key pairs to upload the files. 您将使用临时的access_key / secret_key对来上传文件。 The desktop app will upload the file directly to whatever provider you're dealing with, but it will use a key/secret pair that changes every, say, 12 hours. 桌面应用程序会将文件直接上传到您要与之打交道的任何提供商，但是它将使用密钥/秘密对，该密钥对/密钥对每12小时更改一次。 Each user gets their own pair and you need to make sure that a user only has access to their own files. 每个用户都有自己的对，您需要确保一个用户只能访问自己的文件。 Otherwise they'll be able to access everyone's files and you'll be breaking Wheaton's Law. 否则，他们将可以访问每个人的文件，并且您将违反惠顿定律。 This way, even if they somehow figure out what the secret is they will only have access for 12 hours at most, after which you will change the keys and cut them off. 这样，即使他们以某种方式弄清楚了秘诀是什么，他们最多也只能访问12个小时，然后您将更改密钥并将其切断。

All communication between the app and your server is encrypted using public-key cryptography. 应用程序和服务器之间的所有通信均使用公钥加密技术进行加密。 The private key is stored on your server, the user gets the public key. 私钥存储在您的服务器上，用户获得公钥。 That way you can easily update the encryption keys if needed, because public key is public. 这样，您可以轻松地根据需要更新加密密钥，因为公共密钥是公共的。 Remember, this provides encryption, not authentication. 请记住，这提供了加密，而不是身份验证。

You can easily invalidate a user's access by changing their access_key/secret_key pair(s) used to communicate directly with the server provider(s) and the private key used to communicate with your server. 通过更改用于直接与服务器提供程序通信的access_key / secret_key对和用于与服务器通信的私钥，可以轻松使用户的访问无效。

Your server should keep track of each user's files and validate that what is in your server-side database is the same with what's on storage. 您的服务器应跟踪每个用户的文件，并验证服务器端数据库中的内容与存储中的内容相同。 Do it regularly. 定期做。 Daily, weekly, every 2 hours, whatever works for you. 每天，每周，每2个小时，无论您适合什么。 If you find inconsistencies, investigate. 如果发现不一致之处，请进行调查。 Maybe they are trying to cheat. 也许他们正试图作弊。 Or maybe your app has a bug. 也许您的应用存在错误。 That means you have to be able to identify at the storage level which file belongs to which user. 这意味着您必须能够在存储级别识别哪个文件属于哪个用户。 This can be as easy as storing all files for a user in a directory with their UUID. 这就像将用户的所有文件及其UUID存储在目录中一样容易。 Do not use names or emails there. 请勿在此处使用名称或电子邮件。 No personally identifiable data should be stored anywhere else except in your database. 除数据库外，任何个人身份数据都不应存储在其他任何地方。 Even there, only if needed and it should be encrypted. 即使在那里，也仅在需要时才应进行加密。

So, it goes something like this: 因此，它是这样的：

Desktop app sends a message to your server asking to upload a file. 桌面应用程序向您的服务器发送一条消息，要求上传文件。 Something like "I need to upload a 3.7 GB file". 诸如“我需要上传3.7 GB的文件”之类的内容。 The message is encrypted before being sent with the public key of that user. 在使用该用户的公钥发送该消息之前，该消息已被加密。
Your server gets the message, decrypts it, checks space available, looks for the proper provider in its database and retrieves the latest access_key/secret_key for that provider. 您的服务器获取消息，对其解密，检查可用空间，在其数据库中查找正确的提供程序，并为该提供程序检索最新的access_key / secret_key。
Your server sends something like "ALL_GOOD, upload to provider_AWS_S3, using THIS_ACCESS_KEY paired with THIS_SECRET_KEY". 您的服务器发送类似“ ALL_GOOD，将THIS_ACCESS_KEY与THIS_SECRET_KEY配对使用，上传到provider_AWS_S3”的信息。 Message is encrypted using the private key. 邮件使用私钥加密。
The desktop app uploads the file directly to S3 using the provided keys. 桌面应用程序使用提供的密钥将文件直接上传到S3。

Download and other operations should be made in a similar manner. 下载和其他操作应以类似方式进行。

Great use case for serverless (Lambda on AWS, Google functions, etc.), which should reduce the costs and provide increased redundancy and "uptime". 无服务器（AWS上的Lambda，Google功能等）的绝佳用例，这应降低成本并提供更多的冗余和“正常运行时间”。

Improvements can be made and there are pitfalls. 可以进行改进并且存在陷阱。 Encrypting files client side before upload would add an extra layer of security, for example. 例如，在上传之前在客户端对文件进行加密会增加一层安全性。 But this post is too long already. 但是这篇文章已经太久了。

There you go. 妳去 That'll be $3000 :). 那将是$ 3000 :)。