简体   繁体   English

如何将 AWS Glue 与文档数据库连接

[英]How to connect AWS Glue with Document DB

Do anyone aware how to connect Glue with Document DB?有谁知道如何将 Glue 与 Document DB 连接起来?

Unfortunately the AWS blog [https://aws.amazon.com/blogs/big-data/building-aws-glue-spark-etl-jobs-using-amazon-documentdb-with-mongodb-compatibility-and-mongodb/] i followed is not working as expected.不幸的是,AWS 博客 [https://aws.amazon.com/blogs/big-data/building-aws-glue-spark-etl-jobs-using-amazon-documentdb-with-mongodb-compatibility-and-mongodb/]我跟着没有按预期工作。

Steps i followed我遵循的步骤

  • Created the Doc db Security group to open the port 27017创建 Doc db 安全组以打开端口 27017
  • Created a Doc DB using my default VPC in personal account在个人账户中使用我的默认 VPC 创建了一个文档数据库
  • Able to connect to DocumentDb using Cloud9 and created a document in test db能够使用 Cloud9 连接到 DocumentDb 并在测试数据库中创建文档
  • Created a Connection of Document DB in Glue Data Catalogue [while creating chosen default vpc security group and Doc db security group]在 Glue 数据目录中创建了文档数据库的连接 [同时创建了选定的默认 vpc 安全组和 Doc 数据库安全组]
  • Created a S3 VPC endpoint创建了一个 S3 VPC 端点
  • Created a glue VPC endpoint创建了粘合 VPC 端点
  • Create a Job using script given in blog but it's failing in line [dynamic_frame2 = glueContext.create_dynamic_frame.from_options]使用博客中给出的脚本创建作业,但它在行 [dynamic_frame2 = glueContext.create_dynamic_frame.from_options] 中失败

Error - An error occurred while calling o92.getDynamicFrame.错误 - 调用 o92.getDynamicFrame 时发生错误。 Timed out after 30000 ms while waiting to connect.等待连接时 30000 毫秒后超时。 Client view of cluster state is {type=UNKNOWN, servers=[{address=<cluster>:<port>, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by {java.net.SocketTimeoutException: connect timed out}}]集群 state 的客户端视图是 {type=UNKNOWN, servers=[{address=<cluster>:<port>, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by { java.net.SocketTimeoutException:连接超时}}]

I figured it out.我想到了。 Since this job was created using Spark script editor, DB connection needs to be in place.由于此作业是使用 Spark 脚本编辑器创建的,因此需要建立数据库连接。 Open the Job Script, go to Job Details tab click advanced properties and selected the Doc Db connection from the drop down and it worked.打开 Job Script,go 到 Job Details 选项卡,单击高级属性并从下拉列表中选择 Doc Db 连接,它工作了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM