[英]Trying to Push Records from a Dataframe to a SQL Server Table
I am trying to find a way to push everything from a dataframe into a SQL Server table.我试图找到一种方法将数据帧中的所有内容推送到 SQL Server 表中。 I did some Googling and came up with this.
我做了一些谷歌搜索并想出了这个。
Cluster.write \
.format("jdbc") \
.option("url", "jdbc:sqlserver://name_here.database.windows.net:1433;databaseName=db_name") \
.option("dbtable", "dbo.Cluster") \
.option("user", "u_name") \
.option("password", "p_wd") \
.save()
My dataframe is named 'Cluster' and it is of type pyspark.sql.dataframe.DataFrame.我的数据框名为“Cluster”,它的类型为 pyspark.sql.dataframe.DataFrame。 If the table exists in the DB, when I run the code above, I get this error message:
如果该表存在于数据库中,当我运行上面的代码时,会收到以下错误消息:
org.apache.spark.sql.AnalysisException: Table or view 'dbo.Cluster' already exists. SaveMode: ErrorIfExists.;
If I delete the table and run the code above I get this error message:如果我删除表并运行上面的代码,我会收到以下错误消息:
java.lang.IllegalArgumentException: Can't get JDBC type for struct<type:tinyint,size:int,indices:array<int>,values:array<double>>
This should be do-able.这应该是可行的。 I've done this before in a Spark environment;
我之前在 Spark 环境中做过这个; that code worked perfectly fine.
该代码工作得很好。 Now, I am working in a Python environment (using Databricks), and I'm stuck on this one thing.
现在,我在 Python 环境中工作(使用 Databricks),并且我坚持这一件事。
Just to come full-circle, I'm creating the dataframe this way:为了完整起见,我正在以这种方式创建数据框:
df = spark.read \
.jdbc("jdbc:sqlserver://name_here.database.windows.net:1433;databaseName=db_name", "dbo.table_name",
properties={"user": "u_name", "password": "p_wd"})
Then...I do some analytics and the data...eventually I apply some ML algos to the data...and come up with two dataframes.然后……我做了一些分析和数据……最终我对数据应用了一些 ML 算法……并提出了两个数据帧。
df1 = df1.select("*").toPandas()
df2 = df2("*").toPandas()
Now, I need to concatenate these two dataframes together.现在,我需要将这两个数据帧连接在一起。
Cluster = pd.concat([df1, df2], axis=1)
Finally, I want to push this dataframe into a SQL Server table.最后,我想将此数据框推送到 SQL Server 表中。 Everything works totally fine until I eventually try to put everything from the dataframe to the table.
一切都很好,直到我最终尝试将所有内容从数据框放到表格中。 That's the only issue I have here.
这是我在这里唯一的问题。
You may try with a different SaveMode.Either Append to add data :您可以尝试使用不同的 SaveMode.Either Append 添加数据:
Cluster.write \
.format("jdbc") \
.option("url", "jdbc:sqlserver://name_here.database.windows.net:1433;databaseName=db_name") \
.option("dbtable", "dbo.Cluster") \
.option("user", "u_name") \
.option("password", "p_wd") \
.mode("append") \
.save()
or Overwrite to replace the data :或覆盖以替换数据:
Cluster.write \
.format("jdbc") \
.option("url", "jdbc:sqlserver://name_here.database.windows.net:1433;databaseName=db_name") \
.option("dbtable", "dbo.Cluster") \
.option("user", "u_name") \
.option("password", "p_wd") \
.mode("overwrite") \
.save()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.