尝试将记录从数据帧推送到 SQL Server 表

Question

I am trying to find a way to push everything from a dataframe into a SQL Server table.我试图找到一种方法将数据帧中的所有内容推送到 SQL Server 表中。 I did some Googling and came up with this.我做了一些谷歌搜索并想出了这个。

Cluster.write \
    .format("jdbc") \
    .option("url", "jdbc:sqlserver://name_here.database.windows.net:1433;databaseName=db_name") \
    .option("dbtable", "dbo.Cluster") \
    .option("user", "u_name") \
    .option("password", "p_wd") \
    .save()

My dataframe is named 'Cluster' and it is of type pyspark.sql.dataframe.DataFrame.我的数据框名为“Cluster”，它的类型为 pyspark.sql.dataframe.DataFrame。 If the table exists in the DB, when I run the code above, I get this error message:如果该表存在于数据库中，当我运行上面的代码时，会收到以下错误消息：

org.apache.spark.sql.AnalysisException: Table or view &#39;dbo.Cluster&#39; already exists. SaveMode: ErrorIfExists.;

If I delete the table and run the code above I get this error message:如果我删除表并运行上面的代码，我会收到以下错误消息：

java.lang.IllegalArgumentException: Can&#39;t get JDBC type for struct&lt;type:tinyint,size:int,indices:array&lt;int&gt;,values:array&lt;double&gt;&gt;

This should be do-able.这应该是可行的。 I've done this before in a Spark environment;我之前在 Spark 环境中做过这个； that code worked perfectly fine.该代码工作得很好。 Now, I am working in a Python environment (using Databricks), and I'm stuck on this one thing.现在，我在 Python 环境中工作（使用 Databricks），并且我坚持这一件事。

Just to come full-circle, I'm creating the dataframe this way:为了完整起见，我正在以这种方式创建数据框：

df = spark.read \

.jdbc("jdbc:sqlserver://name_here.database.windows.net:1433;databaseName=db_name", "dbo.table_name",
              properties={"user": "u_name", "password": "p_wd"})

Then...I do some analytics and the data...eventually I apply some ML algos to the data...and come up with two dataframes.然后……我做了一些分析和数据……最终我对数据应用了一些 ML 算法……并提出了两个数据帧。

df1 = df1.select("*").toPandas()
df2 = df2("*").toPandas()

Now, I need to concatenate these two dataframes together.现在，我需要将这两个数据帧连接在一起。

Cluster = pd.concat([df1, df2], axis=1)

Finally, I want to push this dataframe into a SQL Server table.最后，我想将此数据框推送到 SQL Server 表中。 Everything works totally fine until I eventually try to put everything from the dataframe to the table.一切都很好，直到我最终尝试将所有内容从数据框放到表格中。 That's the only issue I have here.这是我在这里唯一的问题。

Answer 1

You may try with a different SaveMode.Either Append to add data :您可以尝试使用不同的 SaveMode.Either Append 添加数据：

Cluster.write \
    .format("jdbc") \
    .option("url", "jdbc:sqlserver://name_here.database.windows.net:1433;databaseName=db_name") \
    .option("dbtable", "dbo.Cluster") \
    .option("user", "u_name") \
    .option("password", "p_wd") \
    .mode("append") \
    .save()

or Overwrite to replace the data :或覆盖以替换数据：

Cluster.write \
    .format("jdbc") \
    .option("url", "jdbc:sqlserver://name_here.database.windows.net:1433;databaseName=db_name") \
    .option("dbtable", "dbo.Cluster") \
    .option("user", "u_name") \
    .option("password", "p_wd") \
    .mode("overwrite") \
    .save()

Source 来源

尝试将记录从数据帧推送到 SQL Server 表

问题描述

1 个解决方案

解决方案1
1 2020-02-06 23:53:54

尝试将记录从数据帧推送到 SQL Server 表

问题描述

1 个解决方案

解决方案1 1 2020-02-06 23:53:54

解决方案1
1 2020-02-06 23:53:54