简体   繁体   English

Amazon AWS - S3 到 ElasticSearch (Python Lambda)

[英]Amazon AWS - S3 to ElasticSearch (Python Lambda)

I'd like to copy data from an S3 directory to the Amazon ElasticSearch service.我想将数据从 S3 目录复制到 Amazon ElasticSearch 服务。 I've tried following the guide , but unfortunately the part I'm looking for is missing.我试过遵循指南,但不幸的是我正在寻找的部分丢失了。 I don't know how the lambda function itself should look like (and all the info about this in the guide is: "Place your application source code in the eslambda folder.").我不知道 lambda 函数本身应该是什么样子(指南中有关此的所有信息是:“将您的应用程序源代码放在 eslambda 文件夹中。”)。 I'd like ES to autoindex the files.我希望 ES 自动索引文件。

Currently I'm trying目前我正在尝试

for record in event['Records']:
    bucket = record['s3']['bucket']['name']
    key = urllib.unquote_plus(record['s3']['object']['key'])
    index_name = event.get('index_name', key.split('/')[0])
    object = s3_client.Object(bucket, key)

    data = object.get()['Body'].read()

    helpers.bulk(es, data, chunk_size=100)

But I get like a massive error stating elasticsearch.exceptions.RequestError: TransportError(400, u'action_request_validation_exception', u'Validation Failed: 1: index is missing;2: type is missing;3: index is missing;4: type is missing;5: index is missing;6: type is missing;7: ...但我得到了一个巨大的错误,说明elasticsearch.exceptions.RequestError: TransportError(400, u'action_request_validation_exception', u'Validation Failed: 1: index is missing;2: type is missing;3: index is missing;4: type is missing;5: index is missing;6: type is missing;7: ...

Could anyone explain to me, how can I set things up so that my data gets moved from S3 to ES where it gets auto-mapped and auto-indexed?任何人都可以向我解释,我该如何设置以便我的数据从 S3 移动到自动映射和自动索引的 ES? Apparently it's possible, as mentioned in the reference here and here .显然这是可能的,正如参考资料中提到的herehere

While mapping can automatically be assigned in Elasticsearch, the indexes are not automatically generated.虽然可以在 Elasticsearch 中自动分配映射,但不会自动生成索引。 You have to specify the index name and type in the POST request.您必须在 POST 请求中指定索引名称和类型。 If that index does not exist, then Elasticsearch will create the index automatically.如果该索引不存在,则 Elasticsearch 将自动创建索引。

Based on your error, it looks like you're not passing through an index and type.根据您的错误,您似乎没有通过索引和类型。

For example, here's how a simple POST request to add a record to the index MyIndex and type MyType which would first create the index and type if it did not already exist.例如,这里是一个简单的 POST 请求如何将记录添加到索引MyIndex并键入MyType ,如果它不存在,它将首先创建索引和类型。

curl -XPOST 'example.com:9200/MyIndex/MyType/' \ 
    -d '{"name":"john", "tags" : ["red", "blue"]}'

I wrote a script to download a csv file from S3 and then transfer the data to ES.我写了一个脚本从 S3 下载一个 csv 文件,然后将数据传输到 ES。

  1. Made an S3 client using boto3 and downloaded the file from S3使用 boto3 制作 S3 客户端并从 S3 下载文件
  2. Made an ES client to connect to Elasticsearch.制作了一个 ES 客户端来连接到 Elasticsearch。
  3. Opened the csv file and used the helpers module from elasticsearch to insert csv file contents into elastic search.打开 csv 文件并使用 elasticsearch 中的 helpers 模块将 csv 文件内容插入到弹性搜索中。

main.py主文件

import boto3
from elasticsearch import helpers, Elasticsearch
import csv
import os
from config import *


#S3
Downloaded_Filename=os.path.basename(Prefix)
s3 = boto3.client('s3', aws_access_key_id=awsaccesskey,aws_secret_access_key=awssecretkey,region_name=awsregion)
s3.download_file(Bucket,Prefix,Downloaded_Filename)

#ES
ES_index = Downloaded_Filename.split(".")[0]
ES_client = Elasticsearch([ES_host],http_auth=(ES_user, ES_password),port=ES_port)

#S3 to ES
with open(Downloaded_Filename) as f:
    reader = csv.DictReader(f)
    helpers.bulk(ES_client, reader, index=ES_index, doc_type='my-type')

config.py配置文件

awsaccesskey = ""
awssecretkey = ""
awsregion = "us-east-1"
Bucket=""
Prefix=''
ES_host = "localhost"
ES_port = "9200"
ES_user = "elastic"
ES_password = "changeme"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM