简体   繁体   English

PySpark 从所有列名中删除一个字符之前的字符串

[英]PySpark remove string before a character from all column names

I have some column names in a dataset that have three underscores ___ in the string.我在数据集中有一些列名,字符串中有三个下划线 ___。 Using PySpark, I would like to remove all characters before the underscores including the underscores, and keep the remaining characters as column names.使用 PySpark,我想删除下划线之前的所有字符,包括下划线,并将剩余字符保留为列名。 I need the code to dynamically rename column names instead of writing column names in the code.我需要代码来动态重命名列名,而不是在代码中写入列名。 If ___ is at the start or end of the column name, it should only remove ___ and leave remaining characters as it is.如果 ___ 位于列名的开头或结尾,它应该只删除 ___ 并保留剩余的字符。

Example:例子:

Input column names:输入列名:

sequence_number   
department  
user___first_name  
user___last_name  
phone___mobile1
___city  
state___
zip_code

Desired output column names:所需的 output 列名称:

sequence_number   
department  
first_name  
last_name  
mobile1
city  
state
zip_code

Try with this:试试这个:

import re

def normalize(col):
    """removes *___ from beginning or end of column names"""
    col = col.rstrip("___")
    return re.sub(r'^(.*___)(.*)$', r'\2', col)

# nozmalize column names in dataframe
df = df.toDF(*[normalize(c) for c in df.columns])

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从pyspark中的字符串中删除特定字符? - How to remove specific character from a string in pyspark? 有没有办法从由字符串组成的列中的所有值中删除字符串字符? - Is there a way to remove a string character from all the values in a column consisting of strings? Pandas删除字符后的列中的所有字符串 - Pandas remove all of a string in a column after a character Pandas:删除 dataframe 列中特定字符之前的所有字符 - Pandas: Remove all characters before a specific character in a dataframe column 如果我想删除“,”之前的所有值,如何从整个列中删除字符串的一部分? - How do I remove a part of a string from an entire column, if I want to remove all the values before a “,”? Pandas 数据框列删除第一个特定字符之前的字符串 - Pandas dataframe column remove string before the first specific character Pandas:删除特定字符重复4次的dataframe列中特定字符(最后一个特定字符)之前的所有字符 - Pandas: Remove all characters before a specific character (last specific character) in a dataframe column that specific character is repeated 4 times 对于 pandas 中的列中的所有值,如何从字符串中删除字符并将 rest 转换为 integer 或十进制? - How to remove a character from a string and convert the rest into integer or decimal, for all values in a column in pandas? Pyspark:返回最大值的所有列名 - Pyspark : Return all column names of max values 如何在特定字符之前从字符串中删除特殊字符? - How to remove special characters from a string before specific character?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM