从 Dataframe 列名称中删除句点 (.)

Question

因此，我已经浏览了此处所有从列名中替换特殊字符的示例，但我似乎无法让它在一段时间内正常工作。

我试过的：

# works to remove spaces
df.select([F.col(c).alias(c.replace(' ', '_')) for c in df.columns])

# doesn't work to remove periods
df.select([F.col(c).alias(c.replace('.', '')) for c in df.columns])

# removes special characters except periods 
df.select([F.col(col).alias(re.sub("[^0-9a-zA-Z$]+","",col)) for col in df.columns])

我知道如何通过引用该特定列来更改列的名称，但这需要更改任何 dataframe 的列名称，其中列具有特殊字符

具体来说，这是给我带来麻烦的列名：“Src. of Business Contact Full Name”

Answer 1

尝试使用反引号“col_name”的escaping列名

.

df=spark.createDataFrame([('1','2')],['header','pla.nned'])
df.columns
#['header', 'pla.nned']

from pyspark.sql import functions as F
df.select([F.col("`{0}`".format(c)).alias(c.replace('.', '')) for c in df.columns]).show()
#+------+-------+
#|header|planned|
#+------+-------+
#|     1|      2|
#+------+-------+

Answer 2

select()是一个不推荐使用的方法。 为什么不像下面这样简单呢？

import re
df = pd.DataFrame(["a biz"], columns=["Src.$ of-Business Contact` Full Name"])
df.columns = [re.sub("[ ,-]", "_", re.sub("[\.,`,\$]", "", c)) for c in df.columns]
df

output

    Src_of_Business_Contact_Full_Name
0   a biz

Answer 3

go 的另一种方法是使用reduce和withColumnRenamed 。

from functools import reduce

(reduce(lambda new_df, col: new_df.withColumnRenamed(col,col.replace('.','')),df.columns,df)).show()

从 Dataframe 列名称中删除句点 (.)

问题描述

3 个解决方案

解决方案1
1 2020-07-09 16:37:02

解决方案2
1 已采纳 2020-07-09 16:43:22

解决方案3
0 2020-07-09 16:41:49

从 Dataframe 列名称中删除句点 (.)

问题描述

3 个解决方案

解决方案1 1 2020-07-09 16:37:02

解决方案2 1 已采纳 2020-07-09 16:43:22

解决方案3 0 2020-07-09 16:41:49

解决方案1
1 2020-07-09 16:37:02

解决方案2
1 已采纳 2020-07-09 16:43:22

解决方案3
0 2020-07-09 16:41:49