简体   繁体   English

从 Dataframe 列名称中删除句点 (.)

[英]Remove Period (.) from Dataframe Column Names

So I've gone through all the examples on here of replacing special characters from column names, but I can't seem to get it to work for periods.因此,我已经浏览了此处所有从列名中替换特殊字符的示例,但我似乎无法让它在一段时间内正常工作。

What I've tried:我试过的:

# works to remove spaces
df.select([F.col(c).alias(c.replace(' ', '_')) for c in df.columns])

# doesn't work to remove periods
df.select([F.col(c).alias(c.replace('.', '')) for c in df.columns])

# removes special characters except periods 
df.select([F.col(col).alias(re.sub("[^0-9a-zA-Z$]+","",col)) for col in df.columns])

I know how to change the name of a column by referencing that specific column, but this needs to change names of columns for any dataframe with columns with special characters我知道如何通过引用该特定列来更改列的名称,但这需要更改任何 dataframe 的列名称,其中列具有特殊字符

Specifically here is the column name that is giving me trouble: "Src. of Business Contact Full Name"具体来说,这是给我带来麻烦的列名:“Src. of Business Contact Full Name”

Try by escaping column names using backquotes `col_name`尝试使用反引号“col_name”escaping列名

. .

df=spark.createDataFrame([('1','2')],['header','pla.nned'])
df.columns
#['header', 'pla.nned']

from pyspark.sql import functions as F
df.select([F.col("`{0}`".format(c)).alias(c.replace('.', '')) for c in df.columns]).show()
#+------+-------+
#|header|planned|
#+------+-------+
#|     1|      2|
#+------+-------+

select() is a deprecated method. select()是一个不推荐使用的方法。 Why not make as simple as below?为什么不像下面这样简单呢?

import re
df = pd.DataFrame(["a biz"], columns=["Src.$ of-Business Contact` Full Name"])
df.columns = [re.sub("[ ,-]", "_", re.sub("[\.,`,\$]", "", c)) for c in df.columns]
df

output output

    Src_of_Business_Contact_Full_Name
0   a biz

Another way to go about this using reduce and withColumnRenamed . go 的另一种方法是使用reducewithColumnRenamed

from functools import reduce

(reduce(lambda new_df, col: new_df.withColumnRenamed(col,col.replace('.','')),df.columns,df)).show()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从Spark数据框中删除列名称,同时将其存储为文本文件 - Remove column names from spark dataframe while storing it as textfile 从 dataframe 中删除行,直到找到实际的列名 - Remove the rows from dataframe till the actual column names are found 如何从 DataFrame 列中的名称中删除数字和/或括号 - How do I remove numbers and/or parenthesis from names in a DataFrame column 如何从数据框中的所有列名称/标题中删除数字 - How to remove numbers from all column names / headers in a dataframe 从 dataframe 更改列名 - Changing column names from dataframe 从Dataframe - Pandas中的所有列的列名中删除最后两个字符 - Remove last two characters from column names of all the columns in Dataframe - Pandas 展平多索引 dataframe 级别并从列名称末尾删除字符串(如果包含) - Flatten multiindex dataframe levels and remove string from end of column names if contains 如何从带有时间戳索引的数据帧中删除某个时间段? - How to remove a certain time period from a dataframe with timestamp index? 从 dataframe 名称中删除特殊字符 - Remove special characters from dataframe names 当列名未知时,从 Python 中的 DataFrame 中识别列名 - Identify column names from DataFrame in Python when Column names are unknown
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM