从 Dataframe 列名称中删除句点 (.)

Question

So I've gone through all the examples on here of replacing special characters from column names, but I can't seem to get it to work for periods.因此，我已经浏览了此处所有从列名中替换特殊字符的示例，但我似乎无法让它在一段时间内正常工作。

What I've tried:我试过的：

# works to remove spaces
df.select([F.col(c).alias(c.replace(' ', '_')) for c in df.columns])

# doesn't work to remove periods
df.select([F.col(c).alias(c.replace('.', '')) for c in df.columns])

# removes special characters except periods 
df.select([F.col(col).alias(re.sub("[^0-9a-zA-Z$]+","",col)) for col in df.columns])

I know how to change the name of a column by referencing that specific column, but this needs to change names of columns for any dataframe with columns with special characters我知道如何通过引用该特定列来更改列的名称，但这需要更改任何 dataframe 的列名称，其中列具有特殊字符

Specifically here is the column name that is giving me trouble: "Src. of Business Contact Full Name"具体来说，这是给我带来麻烦的列名：“Src. of Business Contact Full Name”

Answer 1

Try by escaping column names using backquotes `col_name`尝试使用反引号“col_name”的escaping列名

. .

df=spark.createDataFrame([('1','2')],['header','pla.nned'])
df.columns
#['header', 'pla.nned']

from pyspark.sql import functions as F
df.select([F.col("`{0}`".format(c)).alias(c.replace('.', '')) for c in df.columns]).show()
#+------+-------+
#|header|planned|
#+------+-------+
#|     1|      2|
#+------+-------+

Answer 2

select() is a deprecated method. select()是一个不推荐使用的方法。 Why not make as simple as below?为什么不像下面这样简单呢？

import re
df = pd.DataFrame(["a biz"], columns=["Src.$ of-Business Contact` Full Name"])
df.columns = [re.sub("[ ,-]", "_", re.sub("[\.,`,\$]", "", c)) for c in df.columns]
df

output output

    Src_of_Business_Contact_Full_Name
0   a biz

Answer 3

Another way to go about this using reduce and withColumnRenamed . go 的另一种方法是使用reduce和withColumnRenamed 。

from functools import reduce

(reduce(lambda new_df, col: new_df.withColumnRenamed(col,col.replace('.','')),df.columns,df)).show()

从 Dataframe 列名称中删除句点 (.)

问题描述

3 个解决方案

解决方案1
1 2020-07-09 16:37:02

解决方案2
1 已采纳 2020-07-09 16:43:22

解决方案3
0 2020-07-09 16:41:49

从 Dataframe 列名称中删除句点 (.)

问题描述

3 个解决方案

解决方案1 1 2020-07-09 16:37:02

解决方案2 1 已采纳 2020-07-09 16:43:22

解决方案3 0 2020-07-09 16:41:49

解决方案1
1 2020-07-09 16:37:02

解决方案2
1 已采纳 2020-07-09 16:43:22

解决方案3
0 2020-07-09 16:41:49