[英]Is there any way to write data from azure databricks to azure cosmos db GREMLIN API
I am trying to write the vertices and edges to cosmos db gremlin api through Azure databricks but unfortunately I am facing error.我正在尝试通过 Azure 数据块将顶点和边写入 cosmos db gremlin api,但不幸的是我遇到了错误。 I tried changing different versions of cluster and maven libraries still no use.我尝试更改不同版本的集群和 maven 库仍然没有用。
Libraries: Databricks configuration: 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)库:Databricks 配置:10.4 LTS(包括 Apache Spark 3.2.1、Scala 2.12)
Maven library installed: com.azure.cosmos.spark:azure-cosmos-spark_3-2_2-12:4.11.1 Maven 库已安装:com.azure.cosmos.spark:azure-cosmos-spark113-2_2-12.4:
This is the document which I followed.这是我遵循的文件。
https://github.com/Azure/azure-cosmosdb-spark#using-databricks-notebooks https://github.com/Azure/azure-cosmosdb-spark#using-databricks-notebooks
There might be some library conflict issue is happening because in document all older versions configuration are present.可能会发生一些库冲突问题,因为在文档中存在所有旧版本配置。 If any one came across this Kindly help?如果有人遇到这个好心的帮助?
cosmosDbConfig = {
"Endpoint" : "https://xxxxxxxx.gremlin.documents.azure.com:443/",
"Masterkey" : "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"Database" : "sample-database",
"Collection" : "sample-graph",
"Upsert" : "true"
}
cosmosDbFormat = "com.microsoft.azure.cosmosdb.spark"
(cosmosDbVertices.write.format(cosmosDbFormat).mode("append").options(**cosmosDbConfig).save()) ```
Error:
Py4JJavaError: An error occurred while calling o1113.save.
: java.lang.ClassNotFoundException:
Failed to find data source: com.microsoft.azure.cosmosdb.spark. Please find packages at
http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.errors.QueryExecutionErrors$.failedToFindDataSourceError(QueryExecutionErrors.scala:557)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:758)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:808)
at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:983)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:293)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:258)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
I tried to reproduce same thing in my environment, and I got same error.我试图在我的环境中重现同样的事情,我得到了同样的错误。
To resolve this error, try to install com.azure.cosmos.spark:azure-cosmos-spark_3-2_2-12:4.12.2
libraries and also follow below code.要解决此错误,请尝试安装com.azure.cosmos.spark:azure-cosmos-spark_3-2_2-12:4.12.2
库并遵循以下代码。
Code:代码:
cosEndpoint = "https://xxxxxx.dxx.azure.com:443/"
cosMasterkey = "xxxx"
cosDatabase = "xxxx"
cosContainer = "xxxx"
cfg1 = {
"spark.cosmos.accountEndpoint" : cosEndpoint,
"spark.cosmos.accountKey" : cosMasterkey,
"spark.cosmos.database" : cosDatabase,
"spark.cosmos.container" : cosContainer,
}
#Sample dataframe
cosmosDbVertices=spark.createDataFrame((("ss1", "cat", 2, True), ("cc1", "dog", 2, False)))\
.toDF("id","name","age","isAlive")
# writing data into cosmosdb
sf=cosmosDbVertices.write.format("cosmos.oltp").options(**cfg1).mode("APPEND").save()
Output: Output:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.