简体   繁体   English

尝试将记录从数据帧推送到 SQL Server 表

[英]Trying to Push Records from a Dataframe to a SQL Server Table

I am trying to find a way to push everything from a dataframe into a SQL Server table.我试图找到一种方法将数据帧中的所有内容推送到 SQL Server 表中。 I did some Googling and came up with this.我做了一些谷歌搜索并想出了这个。

Cluster.write \
    .format("jdbc") \
    .option("url", "jdbc:sqlserver://name_here.database.windows.net:1433;databaseName=db_name") \
    .option("dbtable", "dbo.Cluster") \
    .option("user", "u_name") \
    .option("password", "p_wd") \
    .save()

My dataframe is named 'Cluster' and it is of type pyspark.sql.dataframe.DataFrame.我的数据框名为“Cluster”,它的类型为 pyspark.sql.dataframe.DataFrame。 If the table exists in the DB, when I run the code above, I get this error message:如果该表存在于数据库中,当我运行上面的代码时,会收到以下错误消息:

org.apache.spark.sql.AnalysisException: Table or view 'dbo.Cluster' already exists. SaveMode: ErrorIfExists.;

If I delete the table and run the code above I get this error message:如果我删除表并运行上面的代码,我会收到以下错误消息:

java.lang.IllegalArgumentException: Can't get JDBC type for struct<type:tinyint,size:int,indices:array<int>,values:array<double>>

This should be do-able.这应该是可行的。 I've done this before in a Spark environment;我之前在 Spark 环境中做过这个; that code worked perfectly fine.该代码工作得很好。 Now, I am working in a Python environment (using Databricks), and I'm stuck on this one thing.现在,我在 Python 环境中工作(使用 Databricks),并且我坚持这一件事。

Just to come full-circle, I'm creating the dataframe this way:为了完整起见,我正在以这种方式创建数据框:

df = spark.read \

.jdbc("jdbc:sqlserver://name_here.database.windows.net:1433;databaseName=db_name", "dbo.table_name",
              properties={"user": "u_name", "password": "p_wd"})

Then...I do some analytics and the data...eventually I apply some ML algos to the data...and come up with two dataframes.然后……我做了一些分析和数据……最终我对数据应用了一些 ML 算法……并提出了两个数据帧。

df1 = df1.select("*").toPandas()
df2 = df2("*").toPandas()

Now, I need to concatenate these two dataframes together.现在,我需要将这两个数据帧连接在一起。

Cluster = pd.concat([df1, df2], axis=1)

Finally, I want to push this dataframe into a SQL Server table.最后,我想将此数据框推送到 SQL Server 表中。 Everything works totally fine until I eventually try to put everything from the dataframe to the table.一切都很好,直到我最终尝试将所有内容从数据框放到表格中。 That's the only issue I have here.这是我在这里唯一的问题。

You may try with a different SaveMode.Either Append to add data :您可以尝试使用不同的 SaveMode.Either Append 添加数据:

Cluster.write \
    .format("jdbc") \
    .option("url", "jdbc:sqlserver://name_here.database.windows.net:1433;databaseName=db_name") \
    .option("dbtable", "dbo.Cluster") \
    .option("user", "u_name") \
    .option("password", "p_wd") \
    .mode("append") \
    .save()

or Overwrite to replace the data :或覆盖以替换数据:

Cluster.write \
    .format("jdbc") \
    .option("url", "jdbc:sqlserver://name_here.database.windows.net:1433;databaseName=db_name") \
    .option("dbtable", "dbo.Cluster") \
    .option("user", "u_name") \
    .option("password", "p_wd") \
    .mode("overwrite") \
    .save()

Source 来源

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM