简体   繁体   English

如何使用 aws 设计多客户端预处理软件管道?

[英]How to design multiclient preprocess software pipeline using aws?

My software goal is to automate the preprocessing pipeline, the pipeline has three code blocks:我的软件目标是自动化预处理管道,管道有三个代码块:

  1. Fetching the data - either by api or by client uploading csv to s3 bucket.获取数据 - 通过 api 或通过客户端将 csv 上传到 s3 存储桶。

  2. Processing the data - my goal is to unified the data from the different clients to a unified end scheme.处理数据——我的目标是将来自不同客户端的数据统一到一个统一的端方案。

  3. Store scheme is database.存储方案是数据库。 I know it is a very common system but I failed to find what is the best design for it.我知道这是一个非常常见的系统,但我找不到最适合它的设计。

The requirements are:要求是:

  1. The system is not real time, for each client I plan each X days to fetch the new data and it is dose not matter if only even a day later it will finish该系统不是实时的,对于每个客户,我计划每 X 天获取新数据,即使仅在一天后完成也没关系
  2. The processing partis unique per client data, of course there are some common features, but also a lot of different features and muniplation.每个客户端数据的处理部分都是唯一的,当然有一些共同的特征,但也有很多不同的特征和muniplation。
  3. I wish the system to be automated.我希望系统是自动化的。

I thought of the following:我想到了以下几点:

  1. The lambda solution: schedule a lambda for each client which will fetch the data every X days, the lambda will trigger another lambda which will do processing. The lambda solution: schedule a lambda for each client which will fetch the data every X days, the lambda will trigger another lambda which will do processing. But if I have 100 clients that will be awful to handle 200 lambdas.但是如果我有 100 个客户端,那么处理 200 个 lambda 表达式会很糟糕。

  2. 2.1 making a project call Api and have different script for each client, my a schudle for each script on a ec2 or ecs. 2.1 制作一个名为 Api 的项目,并为每个客户提供不同的脚本,我对 ec2 或 ecs 上的每个脚本都有一个 schudle。

2.2 Have another project call processing where the father class has the common code and all the subclass client code inherite from it, the API script will activate the relevant processing script. 2.2 有另一个项目调用处理,父class有公共代码,所有子类客户端代码都继承自它,API脚本将激活相关处理脚本。

In the end I am very confused what is the best practice, I only found example which handle one client, or a general scheme approch/ diagram block which is to broad.最后,我很困惑什么是最佳实践,我只找到了处理一个客户端的示例,或者一个广泛的通用方案方法/图表块。 Because I know it such a common system, I would appreciate learning from others experience.因为我知道它是一个如此普遍的系统,所以我会很感激从其他人的经验中学习。 Would appreciate any reference links or wisdom将不胜感激任何参考链接或智慧

Take a look at Step Functions , it will allow you to decouple the execution of each stage and allow you to reuse your Lambdas.看一下Step Functions ,它将允许您解耦每个阶段的执行并允许您重用您的 Lambda。

By passing in input into the step function the top Lambda might be able to make decisions which feed to the others.通过将输入传递到步骤 function 顶部 Lambda 可能能够做出提供给其他人的决策。

To schedule this use a scheduled CloudWatch event要安排此操作,请使用已安排的CloudWatch 事件

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 sklearn 预处理管道中的标签? - How do you preprocess labels in a pipeline with sklearn? 如何使用python预处理Twitter文本数据 - How to preprocess twitter text data using python Python中的多客户端服务器-如何广播? - Multiclient server in Python - how to broadcast? 使用Tensorflow Dataset API读取TFRecords文件时,预处理输入数据会减慢输入管道的速度 - Preprocess the input data slow down the input pipeline when using Tensorflow Dataset API to read TFRecords file Elasticsearch 如何搜索文档? ES中如何自定义预处理管道和评分函数? - How does Elasticsearch search documents? How to customize preprocess pipeline and scoring functions in ES? 当我使用管道对线性 svc 进行预处理、训练和测试时,如何获得最重要的特征系数? - How to get most important feature coefficients when i used pipeline to preprocess, train and test the linear svc? 如何优化预处理所有文本文档而不使用for循环在每次迭代中预处理单个文本文档? - How to optimize preprocess all text documents without using for loop to preprocess a single text document in each iteration? 如何预处理所有呼叫? - How to preprocess all calls? 如何使用提供的需要 tf.Tensor 的 preprocess_input function 预处理 tf.data.Dataset? - How can I preprocess a tf.data.Dataset using a provided preprocess_input function that expects a tf.Tensor? 如何使用NiBabel(Python)预处理NIfTI数据格式 - How to preprocess NIfTI data format using NiBabel (Python)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM