ELK堆栈（Elasticsearch，Logstash，Kibana）-logstash是必需的组件吗？

Question

We're currently processing daily mobile app log data with AWS lambda and posting it into redshift. 我们目前正在使用AWS lambda处理每日移动应用日志数据，并将其发布到redshift中。 The lambda structures the data but it is essentially raw. Lambda构成数据，但本质上是原始数据。 The next step is to do some actual processing of the log data into sessions etc, for reporting purposes. 下一步是将日志数据实际处理为会话等，以进行报告。 The final step is to have something do feature engineering, and then use the data for model training. 最后一步是进行特征工程，然后将数据用于模型训练。

The steps are 步骤是

Structure the raw data for storage 结构原始数据进行存储
Sessionize the data for reporting 会话化数据以进行报告
Feature engineering for modeling 用于建模的特征工程

For step 2, I am looking at using Quicksight and/or Kibana to create reporting dashboard. 对于步骤2，我正在考虑使用Quicksight和/或Kibana创建报告仪表板。 But the typical stack as I understand it is to do the log processing with logstash, then have it go to elasticsreach and finally to Kibana/Quicksight. 但是据我了解，典型的堆栈是使用logstash进行日志处理，然后将其传递给elasticsreach，最后传递给Kibana / Quicksight。 Since we're already handling the initial log processing through lambda, is it possible to skip this step and pass it directly into elasticsearch? 由于我们已经通过lambda处理了初始日志处理，是否可以跳过此步骤并将其直接传递给elasticsearch？ If so where does this happen - in the lambda function or from redshift after it has been stored in a table? 如果是这样，在lambda函数中还是将其存储在表中之后从redshift发生在哪里？ Or can elasticsearch just read it from the same s3 where I'm posting the data for ingestion into a redshift table? 还是Elasticsearch可以从我将要提取的数据发布到redshift表的同一s3中读取它？

Answer 1

Elasticsearch uses JSON to perform all operations. Elasticsearch使用JSON执行所有操作。 For example, to add a document to an index, you use a PUT operation (copied from docs ): 例如，要将文档添加到索引，请使用PUT操作（从docs复制）：

PUT twitter/_doc/1
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

Logstash exists to collect log messages, transform them into JSON, and make these PUT requests. 存在Logstash来收集日志消息，将其转换为JSON，然后发出这些PUT请求。 However, anything that produces correctly-formatted JSON and can perform an HTTP PUT will work. 但是，任何生成正确格式的JSON并可以执行HTTP PUT的东西都可以使用。 If you already invoke Lambdas to transform your S3 content, then you should be able to adapt them to write JSON to Elasticsearch. 如果您已经调用Lambdas来转换S3内容，那么您应该能够使它们适应将JSON写入Elasticsearch。 I'd use separate Lambdas for Redshift and Elasticsearch, simply to improve manageability. 我将为Redshift和Elasticsearch使用单独的Lambda，只是为了提高可管理性。

Performance tip: you're probably processing lots of records at a time, in which case the bulk API will be more efficient than individual PUTs. 性能提示：您可能一次要处理大量记录，在这种情况下，批量API比单个PUT效率更高。 However, there is a limit on the size of a request, so you'll need to batch your input. 但是，请求的大小是有限制的，因此您需要分批输入。

Also: you don't say whether you're using an AWS Elasticsearch cluster or self-managed. 另外：您不会说您使用的是AWS Elasticsearch集群还是自我管理的集群。 If the former you'll also have to deal with authenticated requests, or use an IP-based access policy on the cluster. 如果是前者，您还必须处理经过身份验证的请求，或者在群集上使用基于IP的访问策略。 You don't say what language your Lambdas are written in, but if it's Python you can use the aws-requests-auth library to make authenticated requests. 您没有说Lambda使用什么语言编写，但是如果是Python，则可以使用aws-requests-auth库发出经过身份验证的请求。

ELK堆栈（Elasticsearch，Logstash，Kibana）-logstash是必需的组件吗？

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-04-18 10:17:18

ELK堆栈（Elasticsearch，Logstash，Kibana）-logstash是必需的组件吗？

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-04-18 10:17:18

解决方案1
2 已采纳 2019-04-18 10:17:18