In pyspark , suppose I have dataframe with columns named as 'a1','a2','a3'...'a99'
, how do I apply operation on each of them to create new columns with new names dynamically?
For example, to getnew columns such as sum('a1') as 'total_a1' , ... sum('a99') as 'total_a99'
.
You can use a list comprehension with alias
.
To return only the new columns:
import pyspark.sql.functions as f
df1 = df.select(*[f.sum(c).alias("total_"+c) for c in df.columns])
And if you wanted to keep the existing columns as well:
df2 = df.select("*", *[f.sum(c).alias("total_"+c) for c in df.columns])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.