I have a csv in a blob storage and I was able to read it as a pandas datafrme in Databricks.
sourcefile = 'MiningProcess_Flotation_Plant_Database.csv'
df = spark.read.format('csv').option("header","true").load(db_ws.dp_engagement + '/' + sourcefile)
display(df)
I tried creating a table with this:
df.write.format("parquet").saveAsTable("MY_PERMANENT_TABLE_NAME")
And it works.
But alll columns which are numbers were created as strings.
So I tried to change the type:
%sql
ALTER TABLE MY_PERMANENT_TABLE_NAME CHANGE `% Iron Concentrate` TYPE decimal
But I get this error:
Error in SQL statement: AnalysisException: ALTER TABLE CHANGE COLUMN is not supported for changing column '% Iron Concentrate' with type 'StringType' to '% Iron Concentrate' with type 'DecimalType(10,0)'
Dataset is here:https://www.kaggle.com/code/sfbruno/mining-quality-xgboost/data
You neither specify the schema of for your input data using .schema
nor specify the .option("inferSchema", "true")
, so CSV reader assumes that all columns are of the string type. If you don't want to specify schema, then add .option("inferSchema", "true")
when reading data.
You can't simply change type using ALTER TABLE
, especially for incompatible types, such as string to number. So you either read data correctly & writing it using .mode("overwrite")
, or doing create table <table_name> as select cast(...) ... from <table_name>
...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.