简体   繁体   中英

Change the datatype of a column in delta table

Is there a SQL command that I can easily use to change the datatype of a existing column in Delta table. I need to change the column datatype from BIGINT to STRING. Below is the SQL command I'm trying to use but no luck.

%sql ALTER TABLE [TABLE_NAME] ALTER COLUMN [COLUMN_NAME] STRING

Error I'm getting:

org.apache.spark.sql.AnalysisException
ALTER TABLE CHANGE COLUMN is not supported for changing column 'bam_user' with type 
'IntegerType' to 'bam_user' with type 'StringType'

No Option to change the data type of column or dropping the column. You can read the data in datafame, modify the data type and with help of withColumn() and drop() and overwrite the table.

There is no real way to do this using SQL, unless you copy to a different table altogether. This option includes INSERT data to a new table, DROP TABLE and re-CREATE with the new structure and therefore risky.

The way to do this in python is as follows:

Let's say this is your table: CREATE TABLE person (id INT, name STRING, age INT, class INT, address STRING); INSERT INTO person VALUES (100, 'John', 30, 1, 'Street 1'), (200, 'Mary', NULL, 1, 'Street 2'), (300, 'Mike', 80, 3, 'Street 3'), (400, 'Dan', 50, 4, 'Street 4'); CREATE TABLE person (id INT, name STRING, age INT, class INT, address STRING); INSERT INTO person VALUES (100, 'John', 30, 1, 'Street 1'), (200, 'Mary', NULL, 1, 'Street 2'), (300, 'Mike', 80, 3, 'Street 3'), (400, 'Dan', 50, 4, 'Street 4');

You can check the table structure using the following:
DESCRIBE TABLE person

IF you need to change the id to String:

This is the code: %py from pyspark.sql.functions import col

df = spark.read.table("person")

df1 = df.withColumn("id",col("id").cast("string"))

df1.write
.format ("parquet").mode("overwrite")
.option("overwriteSchema", "true")
.saveAsTable("person")

Couple of pointers: the format is parquet in this table. That's the default for Databricks . So you can omit the "format" line (note that Python is very sensitive regarding spaces).

Re databricks:

If the format is "delta" you must specify this.

Also, if the table is partitioned, it's important to mention that in the code: For example: df1.write
.format ("delta")
.mode("overwrite")
.partitionBy("col_to_partition1", "col_to_partition2")
.option("overwriteSchema", "true")
.save(table_location)
df1.write
.format ("delta")
.mode("overwrite")
.partitionBy("col_to_partition1", "col_to_partition2")
.option("overwriteSchema", "true")
.save(table_location)

When table_location is where the delta table is saved.

(some of this answer is based on this )

SQL doesn't support this, but it can be done in python:

from pyspark.sql.functions import col

# set dataset location and columns with new types
table_path = '/mnt/dataset_location...'
types_to_change = {
  'column_1' : 'int',
  'column_2' : 'string',
  'column_3' : 'double'
}

# load to dataframe, change types
df = spark.read.format('delta').load(table_path)
for column in types_to_change:
  df = df.withColumn(column,col(column).cast(types_to_change[column]))
  
# save df with new types overwriting the schema
df.write.format("delta").mode("overwrite").option("overwriteSchema",True).save("dbfs:" + table_path)

Suppose you want to change data type of column "column_name" to "int" of table "delta_table_name"

spark.read.table("delta_table_name") .withColumn("Column_name",col("Column_name").cast("new_data_type"))     .write.format("delta").mode("overwrite").option("overwriteSchema",true).saveAsTable("delta_table_name")
  1. Read the table using spark .
  2. Use withColumn method to transform the column you want.
  3. Write the table back, mode overwrite and overwriteSchema True

Reference: https://docs.databricks.com/delta/update-schema.html#explicitly-update-schema-to-change-column-type-or-name

from pyspark.sql import functions as F
spark.read.table("<TABLE NAME>") \
          .withColumn("<COLUMN NAME> ",F.col("<COLUMN NAME>").cast("<DATA TYPE>")) \
          .write.format("delta").mode("overwrite").option("overwriteSchema",True).saveAsTable("<TABLE NAME>")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM