[英]Remove Period (.) from Dataframe Column Names
So I've gone through all the examples on here of replacing special characters from column names, but I can't seem to get it to work for periods.因此,我已经浏览了此处所有从列名中替换特殊字符的示例,但我似乎无法让它在一段时间内正常工作。
What I've tried:我试过的:
# works to remove spaces
df.select([F.col(c).alias(c.replace(' ', '_')) for c in df.columns])
# doesn't work to remove periods
df.select([F.col(c).alias(c.replace('.', '')) for c in df.columns])
# removes special characters except periods
df.select([F.col(col).alias(re.sub("[^0-9a-zA-Z$]+","",col)) for col in df.columns])
I know how to change the name of a column by referencing that specific column, but this needs to change names of columns for any dataframe with columns with special characters我知道如何通过引用该特定列来更改列的名称,但这需要更改任何 dataframe 的列名称,其中列具有特殊字符
Specifically here is the column name that is giving me trouble: "Src. of Business Contact Full Name"具体来说,这是给我带来麻烦的列名:“Src. of Business Contact Full Name”
Try by escaping column names using backquotes `col_name`尝试使用反引号“col_name”的escaping列名
. .
df=spark.createDataFrame([('1','2')],['header','pla.nned'])
df.columns
#['header', 'pla.nned']
from pyspark.sql import functions as F
df.select([F.col("`{0}`".format(c)).alias(c.replace('.', '')) for c in df.columns]).show()
#+------+-------+
#|header|planned|
#+------+-------+
#| 1| 2|
#+------+-------+
select()
is a deprecated method. select()
是一个不推荐使用的方法。 Why not make as simple as below?为什么不像下面这样简单呢?
import re
df = pd.DataFrame(["a biz"], columns=["Src.$ of-Business Contact` Full Name"])
df.columns = [re.sub("[ ,-]", "_", re.sub("[\.,`,\$]", "", c)) for c in df.columns]
df
output output
Src_of_Business_Contact_Full_Name
0 a biz
Another way to go about this using reduce
and withColumnRenamed
. go 的另一种方法是使用
reduce
和withColumnRenamed
。
from functools import reduce
(reduce(lambda new_df, col: new_df.withColumnRenamed(col,col.replace('.','')),df.columns,df)).show()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.