简体   繁体   中英

How to change databricks column type

I have a csv in a blob storage and I was able to read it as a pandas datafrme in Databricks.

sourcefile = 'MiningProcess_Flotation_Plant_Database.csv'
df = spark.read.format('csv').option("header","true").load(db_ws.dp_engagement + '/' + sourcefile)
display(df)

I tried creating a table with this:

df.write.format("parquet").saveAsTable("MY_PERMANENT_TABLE_NAME")

And it works.

But alll columns which are numbers were created as strings.

So I tried to change the type:

%sql
ALTER TABLE MY_PERMANENT_TABLE_NAME CHANGE `% Iron Concentrate` TYPE decimal

But I get this error:

Error in SQL statement: AnalysisException: ALTER TABLE CHANGE COLUMN is not supported for changing column '% Iron Concentrate' with type 'StringType' to '% Iron Concentrate' with type 'DecimalType(10,0)'

Dataset is here:https://www.kaggle.com/code/sfbruno/mining-quality-xgboost/data

You neither specify the schema of for your input data using .schema nor specify the .option("inferSchema", "true") , so CSV reader assumes that all columns are of the string type. If you don't want to specify schema, then add .option("inferSchema", "true") when reading data.

You can't simply change type using ALTER TABLE , especially for incompatible types, such as string to number. So you either read data correctly & writing it using .mode("overwrite") , or doing create table <table_name> as select cast(...) ... from <table_name> ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM