Python - Pass an operator using literal string?

Question

I have a dictionary of columns names (keys) and their data types (values). The datatypes are literal strings and I'm trying to convert the columns in my PySpark df to the defined data types ie

for k, v in dict.items():
   df.withColumn(f'{k}', col(f'{k}').cast(v))

Obviously the above doesn't work because 'ByteType()' doesn't exactly equal ByteType() . Does anyone have any creative workaround to this?

Answer 1

from pyspark.sql.types import * # don't forget to import
# Solution 1
for k,v in dict.items():
   # returns a new Spark DataFrame so you have to declare new df as df
   df = df.withColumn(f'{k}', col(f'{k}').cast(v)) 
df.printSchema() # check dataframe's schema

Solution 2. I don't know your column's datatype but you can convert ByteType() to FloatType() if your column's value range is not between -128 and 127.

Answer 2

After reading comments, it seems that you simply want to cast datatypes from one dataframe to another.

You can do it like this:

df2.select(*[F.col(c).cast(t) for c, t in df1.dtypes])

Full example:

df1 = spark.createDataFrame([('1', '2')], ['c1', 'c2'])
print(df1.dtypes)
# [('c1', 'string'), ('c2', 'string')]

df2 = spark.createDataFrame([(1, 2)], ['c1', 'c2'])
print(df2.dtypes)
# [('c1', 'bigint'), ('c2', 'bigint')]

df2 = df2.select(*[F.col(c).cast(t) for c, t in df1.dtypes])
print(df2.dtypes)
# [('c1', 'string'), ('c2', 'string')]

Python - Pass an operator using literal string?

Question

2 answers

solution1
0 2022-05-24 06:25:46

solution2
0 2022-05-24 08:20:43

Python - Pass an operator using literal string?

Question

2 answers

solution1 0 2022-05-24 06:25:46

solution2 0 2022-05-24 08:20:43

solution1
0 2022-05-24 06:25:46

solution2
0 2022-05-24 08:20:43