string manipulation for column names in pyspark

Question

This artcle gives a great overview on how to change columnnames. How to change dataframe column names in pyspark?

Nontheless I need something more / slightly adjusted that I am not capable of doing. Can anybody help remove spaces from all colnames? Its needed for eg join commands and the systematic approach reduces the effort of dealing with 30 columns. I suppose a combination of regex and a UDF would work best.

Example: root |-- CLIENT: string (nullable = true) |-- Branch Number: string (nullable = true)

Answer 1

There is a real simple solution:

for name in df.schema.names:
  df = df.withColumnRenamed(name, name.replace(' ', ''))

Answer 2

如果您想使用与前缀（或后缀）连接的相同列名重命名多个列，这应该有效

df.select([f.col(c).alias(PREFIX + c) for c in columns])

string manipulation for column names in pyspark

Question

2 answers

solution1
3 ACCPTED 2017-06-05 17:16:40

solution2
0 2020-01-22 11:48:37

string manipulation for column names in pyspark

Question

2 answers

solution1 3 ACCPTED 2017-06-05 17:16:40

solution2 0 2020-01-22 11:48:37

solution1
3 ACCPTED 2017-06-05 17:16:40

solution2
0 2020-01-22 11:48:37