简体   繁体   中英

Python - Pass an operator using literal string?

I have a dictionary of columns names (keys) and their data types (values). The datatypes are literal strings and I'm trying to convert the columns in my PySpark df to the defined data types ie

for k, v in dict.items():
   df.withColumn(f'{k}', col(f'{k}').cast(v))

Obviously the above doesn't work because 'ByteType()' doesn't exactly equal ByteType() . Does anyone have any creative workaround to this?

from pyspark.sql.types import * # don't forget to import
# Solution 1
for k,v in dict.items():
   # returns a new Spark DataFrame so you have to declare new df as df
   df = df.withColumn(f'{k}', col(f'{k}').cast(v)) 
df.printSchema() # check dataframe's schema

Solution 2. I don't know your column's datatype but you can convert ByteType() to FloatType() if your column's value range is not between -128 and 127.

After reading comments, it seems that you simply want to cast datatypes from one dataframe to another.

You can do it like this:

df2.select(*[F.col(c).cast(t) for c, t in df1.dtypes])

Full example:

df1 = spark.createDataFrame([('1', '2')], ['c1', 'c2'])
print(df1.dtypes)
# [('c1', 'string'), ('c2', 'string')]

df2 = spark.createDataFrame([(1, 2)], ['c1', 'c2'])
print(df2.dtypes)
# [('c1', 'bigint'), ('c2', 'bigint')]

df2 = df2.select(*[F.col(c).cast(t) for c, t in df1.dtypes])
print(df2.dtypes)
# [('c1', 'string'), ('c2', 'string')]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM