简体   繁体   English

一次替换熊猫数据框中的所有字符

[英]Replace all characters at once in Pandas dataframe

I have a mutliple columns with different name format. 我有一个具有不同名称格式的多重列。 For example: 例如:

df.columns = ['name_column 1 (type1), name-column_2-(type1),...]

I need to replace all characters (except underscore) with underscore. 我需要用下划线替换所有字符(下划线除外)。 But if there is '-(' , I need just one underscore '_', not two for each special character. 但是,如果有'-(',我只需要一个下划线'_',每个特殊字符都不需要两个。

Desired output: 所需的输出:

df.columns = ['name_column_1_type1, name_column_2_type1,...]

I have tried with 我尝试过

for element in df.columns:
    re.sub('[^A-Za-z0-9]+', '_', element)
    print element

But nothing happens, just like in a few other attempts. 但是什么也没有发生,就像其他尝试一样。

Thanks in advance 提前致谢

Use replace + strip : 使用replace + strip

df.columns = df.columns.str.replace('[^A-Za-z0-9]+', '_').str.strip('_')

Sample: 样品:

df = pd.DataFrame(columns=["'name_column 1 (type1)", 'name-column_2-((type1)'])
print (df.columns.tolist())
["'name_column 1 (type1)", 'name-column_2-((type1)']

df.columns =  df.columns.str.replace('[^A-Za-z0-9]+', '_').str.strip('_')
print (df)
Empty DataFrame
Columns: [name_column_1_type1, name_column_2_type1]
Index: []

print (df.columns.tolist())
['name_column_1_type1', 'name_column_2_type1']

尝试:

df.columns = [re.sub('[^A-z0-9]', '_', i).replace(" ", "_").replace("__", "_") for i in df.columns]

Nothing happens because the result of re.sub is not assigned to anything and is therefore lost. 什么都没有发生,因为re.sub的结果没有分配给任何东西,因此丢失了。 You could use a list comprehension and assign the result back to df.columns : 您可以使用列表df.columns并将结果分配回df.columns

df.columns = [re.sub('[^A-Za-z0-9]+', '_', element) for element in df.columns]
print df.columns

Still the regex pattern is wrong, but this should get you started. regex模式仍然是错误的,但这应该可以帮助您入门。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM