How to change databricks column type

Question

I have a csv in a blob storage and I was able to read it as a pandas datafrme in Databricks.

sourcefile = 'MiningProcess_Flotation_Plant_Database.csv'
df = spark.read.format('csv').option("header","true").load(db_ws.dp_engagement + '/' + sourcefile)
display(df)

I tried creating a table with this:

df.write.format("parquet").saveAsTable("MY_PERMANENT_TABLE_NAME")

And it works.

But alll columns which are numbers were created as strings.

So I tried to change the type:

%sql
ALTER TABLE MY_PERMANENT_TABLE_NAME CHANGE `% Iron Concentrate` TYPE decimal

But I get this error:

Error in SQL statement: AnalysisException: ALTER TABLE CHANGE COLUMN is not supported for changing column '% Iron Concentrate' with type 'StringType' to '% Iron Concentrate' with type 'DecimalType(10,0)'

Dataset is here:https://www.kaggle.com/code/sfbruno/mining-quality-xgboost/data

Answer 1

You neither specify the schema of for your input data using .schema nor specify the .option("inferSchema", "true") , so CSV reader assumes that all columns are of the string type. If you don't want to specify schema, then add .option("inferSchema", "true") when reading data.

You can't simply change type using ALTER TABLE , especially for incompatible types, such as string to number. So you either read data correctly & writing it using .mode("overwrite") , or doing create table <table_name> as select cast(...) ... from <table_name> ...

How to change databricks column type

Question

1 answers

solution1
0 2022-07-12 13:36:14

How to change databricks column type

Question

1 answers

solution1 0 2022-07-12 13:36:14

solution1
0 2022-07-12 13:36:14