[英]Replace all characters at once in Pandas dataframe
I have a mutliple columns with different name format. 我有一个具有不同名称格式的多重列。 For example: 例如:
df.columns = ['name_column 1 (type1), name-column_2-(type1),...]
I need to replace all characters (except underscore) with underscore. 我需要用下划线替换所有字符(下划线除外)。 But if there is '-(' , I need just one underscore '_', not two for each special character. 但是,如果有'-(',我只需要一个下划线'_',每个特殊字符都不需要两个。
Desired output: 所需的输出:
df.columns = ['name_column_1_type1, name_column_2_type1,...]
I have tried with 我尝试过
for element in df.columns:
re.sub('[^A-Za-z0-9]+', '_', element)
print element
But nothing happens, just like in a few other attempts. 但是什么也没有发生,就像其他尝试一样。
Thanks in advance 提前致谢
Use replace
+ strip
: 使用replace
+ strip
:
df.columns = df.columns.str.replace('[^A-Za-z0-9]+', '_').str.strip('_')
Sample: 样品:
df = pd.DataFrame(columns=["'name_column 1 (type1)", 'name-column_2-((type1)'])
print (df.columns.tolist())
["'name_column 1 (type1)", 'name-column_2-((type1)']
df.columns = df.columns.str.replace('[^A-Za-z0-9]+', '_').str.strip('_')
print (df)
Empty DataFrame
Columns: [name_column_1_type1, name_column_2_type1]
Index: []
print (df.columns.tolist())
['name_column_1_type1', 'name_column_2_type1']
尝试:
df.columns = [re.sub('[^A-z0-9]', '_', i).replace(" ", "_").replace("__", "_") for i in df.columns]
Nothing happens because the result of re.sub
is not assigned to anything and is therefore lost. 什么都没有发生,因为re.sub
的结果没有分配给任何东西,因此丢失了。 You could use a list comprehension and assign the result back to df.columns
: 您可以使用列表df.columns
并将结果分配回df.columns
:
df.columns = [re.sub('[^A-Za-z0-9]+', '_', element) for element in df.columns]
print df.columns
Still the regex pattern is wrong, but this should get you started. regex模式仍然是错误的,但这应该可以帮助您入门。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.