简体   繁体   English

有什么方法可以将数据从 azure 数据块写入 azure cosmos db GREMLIN API

[英]Is there any way to write data from azure databricks to azure cosmos db GREMLIN API

I am trying to write the vertices and edges to cosmos db gremlin api through Azure databricks but unfortunately I am facing error.我正在尝试通过 Azure 数据块将顶点和边写入 cosmos db gremlin api,但不幸的是我遇到了错误。 I tried changing different versions of cluster and maven libraries still no use.我尝试更改不同版本的集群和 maven 库仍然没有用。

Libraries: Databricks configuration: 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)库:Databricks 配置:10.4 LTS(包括 Apache Spark 3.2.1、Scala 2.12)

Maven library installed: com.azure.cosmos.spark:azure-cosmos-spark_3-2_2-12:4.11.1 Maven 库已安装:com.azure.cosmos.spark:azure-cosmos-spark113-2_2-12.4:

This is the document which I followed.这是我遵循的文件。

https://github.com/Azure/azure-cosmosdb-spark#using-databricks-notebooks https://github.com/Azure/azure-cosmosdb-spark#using-databricks-notebooks

There might be some library conflict issue is happening because in document all older versions configuration are present.可能会发生一些库冲突问题,因为在文档中存在所有旧版本配置。 If any one came across this Kindly help?如果有人遇到这个好心的帮助?

cosmosDbConfig = {
  "Endpoint" : "https://xxxxxxxx.gremlin.documents.azure.com:443/",
  "Masterkey" : "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
  "Database" : "sample-database",
  "Collection" : "sample-graph",
  "Upsert" : "true"
}

cosmosDbFormat = "com.microsoft.azure.cosmosdb.spark"

(cosmosDbVertices.write.format(cosmosDbFormat).mode("append").options(**cosmosDbConfig).save()) ```

Error: 
Py4JJavaError: An error occurred while calling o1113.save.
: java.lang.ClassNotFoundException: 
Failed to find data source: com.microsoft.azure.cosmosdb.spark. Please find packages at
http://spark.apache.org/third-party-projects.html
       
    at org.apache.spark.sql.errors.QueryExecutionErrors$.failedToFindDataSourceError(QueryExecutionErrors.scala:557)
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:758)
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:808)
    at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:983)
    at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:293)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:258)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)



I tried to reproduce same thing in my environment, and I got same error.我试图在我的环境中重现同样的事情,我得到了同样的错误。

在此处输入图像描述

To resolve this error, try to install com.azure.cosmos.spark:azure-cosmos-spark_3-2_2-12:4.12.2 libraries and also follow below code.要解决此错误,请尝试安装com.azure.cosmos.spark:azure-cosmos-spark_3-2_2-12:4.12.2库并遵循以下代码。

Code:代码:

cosEndpoint = "https://xxxxxx.dxx.azure.com:443/"
cosMasterkey = "xxxx"
cosDatabase = "xxxx"
cosContainer = "xxxx"

cfg1 = {
  "spark.cosmos.accountEndpoint" : cosEndpoint,
  "spark.cosmos.accountKey" : cosMasterkey,
  "spark.cosmos.database" : cosDatabase,
  "spark.cosmos.container" : cosContainer,
}

#Sample dataframe
cosmosDbVertices=spark.createDataFrame((("ss1", "cat", 2, True), ("cc1", "dog", 2, False)))\
  .toDF("id","name","age","isAlive")

# writing data into cosmosdb
sf=cosmosDbVertices.write.format("cosmos.oltp").options(**cfg1).mode("APPEND").save()

Output: Output:

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM