I have a dataframe
where all the columns are of type string
and I need them to be of type double
. I have code that does it:
df_Double = df.select([df[c].cast(DoubleType()).alias(c) for c in df.columns]
The problem is that when I save this new dataframe in memory
df_Double.drop("_c0").toPandas().to_csv("all_Double.csv", header = "true")
and I read it again
df_Double = spark.read \
.format("csv") \
.option("inferSchema",True) \
.option("header", True) \
.load("all_Double.csv")
and show your schema
df_Double.printSchema()
all columns are of type string
like the original dataframe
. How can I make the change to be saved in memory and not have to change the data type every time I read the dataframe
?
You can use df_Double.schema when you bring the csv file
df_Double_load = spark.read \
.format("csv") \
.schema(df_Double.schema) \
.option("header", True) \
.load("all_Double.csv")
you shouldn't use the option 'inferSchema' because this option allow spark to set schema automatically. So don't use that option.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.