In python I am trying to create and write to the table TBL
in the database DB
in Databricks. But I get an exception: A schema mismatch detected when writing to the Delta table . My code is as follows, here df
is a pandas dataframe.
from pyspark.sql import SparkSession
DB = database_name
TMP_TBL = temporary_table
TBL = table_name
sesh = SparkSession.builder.getOrCreate()
df_spark = sesh.createDataFrame(df)
df_spark.createOrReplaceTempView(TMP_TABLE)
create_db_query = f"""
CREATE DATABASE IF NOT EXISTS {DB}
COMMENT "This is a database"
LOCATION "/tmp/{DB}"
"""
create_table_query = f"""
CREATE TABLE IF NOT EXISTS {DB}.{TBL}
USING DELTA
TBLPROPERTIES (delta.autoOptimize.optimizeWrite = true, delta.autoOptimize.autoCompact = true)
COMMENT "This is a table"
LOCATION "/tmp/{DB}/{TBL}";
"""
insert_query = f"""
INSERT INTO TABLE {DB}.{TBL} select * from {TMP_TBL}
"""
sesh.sql(create_db_query)
sesh.sql(create_table_query)
sesh.sql(insert_query)
The code fails at the last line, insert_query
line. When I check the database and table have been created but is of course empty. So the problem lies with that the TMP_TBL
and TBL
have different schemas, how and where do I define the schema so they match?
If the schema in your table is different from the schema that you have inserted in your data frame, then you will get an error. make sure it should be same performing insert operation and also try this approach:
I reproduce same thing in my environment. I got this output.
ddl_query = """CREATE TABLE if not exists test123.emp_file
USING DELTA
LOCATION 'dbfs:/user/dem1231'
"""
spark.sql(ddl_query)
insert_query = f"""
INSERT INTO TABLE test123.emp_file select * from temp_table
"""
Or
Try this Alternative approach to insert data into table.
I have a data frame like this with a predefined schema
from pyspark.sql.types import StructType,StructField, StringType, IntegerType
#sample datafram
data = [
("vamsi","1","M",2000),
("saideep","2","M",3000),
("rakesh","3","M",4000)
]
schema = StructType([ \
StructField("firstname",StringType(),True), \
StructField("id", StringType(), True), \
StructField("gender", StringType(), True), \
StructField("salary", IntegerType(), True) \
])
df = spark.createDataFrame(data=data,schema=schema)
After using the write command with append mode directly you can insert it into the SQL table.
df.write.mode("append").format("delta").saveAsTable("DB.TBL")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.