简体   繁体   English

使用 Glue 向 AWS Elastic Search 输入数据

[英]Input data to AWS Elastic Search using Glue

I'm looking for a solution to insert data to AWS Elastic Search using AWS Glue python or pyspark.我正在寻找一种使用 AWS Glue python 或 pyspark 将数据插入 AWS Elastic Search 的解决方案。 I have seen Boto3 SDK for Elastic Search but could not find any function to insert data into Elastic Search.我已经看到 Boto3 SDK 用于弹性搜索,但找不到任何 function 将数据插入弹性搜索。 Can anyone help me to find solution?谁能帮我找到解决方案? Any useful links or code?任何有用的链接或代码?

For aws glue you need to add an additional jar to the job.对于 aws 胶水,您需要在作业中添加额外的 jar。

  1. Download the jar from https://repo1.maven.org/maven2/org/elasticsearch/elasticsearch-hadoop/7.8.0/elasticsearch-hadoop-7.8.0.jar Download the jar from https://repo1.maven.org/maven2/org/elasticsearch/elasticsearch-hadoop/7.8.0/elasticsearch-hadoop-7.8.0.jar
  2. Save the jar on s3 and pass it to the glue job.将 jar 保存在 s3 上并将其传递给胶水作业。
  3. Now while saving the dataframe use following现在在保存 dataframe 的同时使用以下
df.write.format("org.elasticsearch.spark.sql").\
         option("es.resource", "index/document").\
         option("es.nodes", host).\
         option("es.port", port).\
         save()

If you are using aws managed elastic search, try setting this to true如果您使用的是 aws 托管弹性搜索,请尝试将其设置为 true

option("es.nodes.wan.only", "true")

For more properties check https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html有关更多属性,请查看https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html

NOTE The elasticsearch-spark connector is compatible with spark 2.3 only as it is prebuilt on scala 2.11 while spark 2.4 and spark 3.0 is prebuilt on scala 2.12注意elasticsearch-spark 连接器仅与 spark 2.3 兼容,因为它是在 scala 2.11 上预构建的,而 spark 2.4 和 spark 3.0 是在 scala 2.12 上预构建的

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM