[英]Renaming column names in Pandas
I want to change the column labels of a Pandas DataFrame from我想从
['$a', '$b', '$c', '$d', '$e']
to至
['a', 'b', 'c', 'd', 'e']
Use the df.rename()
function and refer the columns to be renamed.使用
df.rename()
函数并引用要重命名的列。 Not all the columns have to be renamed:并非所有列都必须重命名:
df = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'})
# Or rename the existing DataFrame (rather than creating a copy)
df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}, inplace=True)
Minimal Code Example最小代码示例
df = pd.DataFrame('x', index=range(3), columns=list('abcde'))
df
a b c d e
0 x x x x x
1 x x x x x
2 x x x x x
The following methods all work and produce the same output:以下方法都有效并产生相同的输出:
df2 = df.rename({'a': 'X', 'b': 'Y'}, axis=1) # new method
df2 = df.rename({'a': 'X', 'b': 'Y'}, axis='columns')
df2 = df.rename(columns={'a': 'X', 'b': 'Y'}) # old method
df2
X Y c d e
0 x x x x x
1 x x x x x
2 x x x x x
Remember to assign the result back, as the modification is not-inplace.请记住将结果分配回去,因为修改不是就地的。 Alternatively, specify
inplace=True
:或者,指定
inplace=True
:
df.rename({'a': 'X', 'b': 'Y'}, axis=1, inplace=True)
df
X Y c d e
0 x x x x x
1 x x x x x
2 x x x x x
From v0.25, you can also specify errors='raise'
to raise errors if an invalid column-to-rename is specified.从 v0.25 开始,如果指定了要重命名的无效列,您还可以指定
errors='raise'
来引发错误。 See v0.25 rename()
docs .请参阅v0.25
rename()
文档。
Use df.set_axis()
with axis=1
and inplace=False
(to return a copy).将
df.set_axis()
与axis=1
和inplace=False
一起使用(返回副本)。
df2 = df.set_axis(['V', 'W', 'X', 'Y', 'Z'], axis=1, inplace=False)
df2
V W X Y Z
0 x x x x x
1 x x x x x
2 x x x x x
This returns a copy, but you can modify the DataFrame in-place by setting inplace=True
(this is the default behaviour for versions <=0.24 but is likely to change in the future).这将返回一个副本,但您可以通过设置 inplace
inplace=True
修改 DataFrame(这是版本 <=0.24 的默认行为,但将来可能会更改)。
You can also assign headers directly:您也可以直接分配标题:
df.columns = ['V', 'W', 'X', 'Y', 'Z']
df
V W X Y Z
0 x x x x x
1 x x x x x
2 x x x x x
Just assign it to the .columns
attribute:只需将其分配给
.columns
属性:
>>> df = pd.DataFrame({'$a':[1,2], '$b': [10,20]})
>>> df
$a $b
0 1 10
1 2 20
>>> df.columns = ['a', 'b']
>>> df
a b
0 1 10
1 2 20
如使用文本数据中所述:
df.columns = df.columns.str.replace('$', '')
There have been some significant updates to column renaming in version 0.21. 0.21 版中对列重命名进行了一些重大更新。
rename
method has added the axis
parameter which may be set to columns
or 1
. rename
方法添加了axis
参数,可以设置为columns
或1
。 This update makes this method match the rest of the pandas API.index
and columns
parameters but you are no longer forced to use them.index
和columns
参数,但您不再被迫使用它们。set_axis
method with the inplace
set to False
enables you to rename all the index or column labels with a list.inplace
设置为False
的set_axis
方法使您能够使用列表重命名所有索引或列标签。Construct sample DataFrame:构建示例 DataFrame:
df = pd.DataFrame({'$a':[1,2], '$b': [3,4],
'$c':[5,6], '$d':[7,8],
'$e':[9,10]})
$a $b $c $d $e
0 1 3 5 7 9
1 2 4 6 8 10
rename
with axis='columns'
or axis=1
rename
与axis='columns'
或axis=1
一起使用df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis='columns')
or或者
df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis=1)
Both result in the following:两者都导致以下结果:
a b c d e
0 1 3 5 7 9
1 2 4 6 8 10
It is still possible to use the old method signature:仍然可以使用旧的方法签名:
df.rename(columns={'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'})
The rename
function also accepts functions that will be applied to each column name. rename
函数还接受将应用于每个列名的函数。
df.rename(lambda x: x[1:], axis='columns')
or或者
df.rename(lambda x: x[1:], axis=1)
set_axis
with a list and inplace=False
set_axis
与列表和 inplace inplace=False
一起使用You can supply a list to the set_axis
method that is equal in length to the number of columns (or index).您可以为
set_axis
方法提供一个长度等于列数(或索引)的列表。 Currently, inplace
defaults to True
, but inplace
will be defaulted to False
in future releases.目前,
inplace
默认为True
,但在未来的版本中, inplace
将默认为False
。
df.set_axis(['a', 'b', 'c', 'd', 'e'], axis='columns', inplace=False)
or或者
df.set_axis(['a', 'b', 'c', 'd', 'e'], axis=1, inplace=False)
df.columns = ['a', 'b', 'c', 'd', 'e']
?df.columns = ['a', 'b', 'c', 'd', 'e']
? There is nothing wrong with assigning columns directly like this.像这样直接分配列并没有错。 It is a perfectly good solution.
这是一个非常好的解决方案。
The advantage of using set_axis
is that it can be used as part of a method chain and that it returns a new copy of the DataFrame.使用
set_axis
的优点是它可以用作方法链的一部分,并且它返回 DataFrame 的新副本。 Without it, you would have to store your intermediate steps of the chain to another variable before reassigning the columns.没有它,在重新分配列之前,您必须将链的中间步骤存储到另一个变量中。
# new for pandas 0.21+
df.some_method1()
.some_method2()
.set_axis()
.some_method3()
# old way
df1 = df.some_method1()
.some_method2()
df1.columns = columns
df1.some_method3()
Since you only want to remove the $ sign in all column names, you could just do:由于您只想删除所有列名中的 $ 符号,您可以这样做:
df = df.rename(columns=lambda x: x.replace('$', ''))
OR或者
df.rename(columns=lambda x: x.replace('$', ''), inplace=True)
在 Pandas 中重命名列是一项简单的任务。
df.rename(columns={'$a': 'a', '$b': 'b', '$c': 'c', '$d': 'd', '$e': 'e'}, inplace=True)
df.columns = ['a', 'b', 'c', 'd', 'e']
它将按照您提供的顺序将现有名称替换为您提供的名称。
Use:利用:
old_names = ['$a', '$b', '$c', '$d', '$e']
new_names = ['a', 'b', 'c', 'd', 'e']
df.rename(columns=dict(zip(old_names, new_names)), inplace=True)
This way you can manually edit the new_names
as you wish.这样,您可以根据需要手动编辑
new_names
。 It works great when you need to rename only a few columns to correct misspellings, accents, remove special characters, etc.当您只需要重命名几列以纠正拼写错误、重音符号、删除特殊字符等时,它非常有用。
I would like to explain a bit what happens behind the scenes.我想解释一下幕后发生的事情。
Dataframes are a set of Series.数据框是一组系列。
Series in turn are an extension of a numpy.array
.系列又是
numpy.array
的扩展。
numpy.array
s have a property .name
. numpy.array
有一个属性.name
。
This is the name of the series.这是该系列的名称。 It is seldom that Pandas respects this attribute, but it lingers in places and can be used to hack some Pandas behaviors.
Pandas 很少尊重此属性,但它在某些地方徘徊,可用于破解 Pandas 的某些行为。
A lot of answers here talks about the df.columns
attribute being a list
when in fact it is a Series
.这里的很多答案都谈到
df.columns
属性是一个list
,而实际上它是一个Series
。 This means it has a .name
attribute.这意味着它有一个
.name
属性。
This is what happens if you decide to fill in the name of the columns Series
:如果您决定填写
Series
列的名称,就会发生这种情况:
df.columns = ['column_one', 'column_two']
df.columns.names = ['name of the list of columns']
df.index.names = ['name of the index']
name of the list of columns column_one column_two
name of the index
0 4 1
1 5 2
2 6 3
Note that the name of the index always comes one column lower.请注意,索引的名称总是低一列。
The .name
attribute lingers on sometimes. .name
属性有时会持续存在。 If you set df.columns = ['one', 'two']
then the df.one.name
will be 'one'
.如果您设置
df.columns = ['one', 'two']
那么df.one.name
将是'one'
。
If you set df.one.name = 'three'
then df.columns
will still give you ['one', 'two']
, and df.one.name
will give you 'three'
.如果你设置
df.one.name = 'three'
那么df.columns
仍然会给你['one', 'two']
,并且df.one.name
会给你'three'
。
pd.DataFrame(df.one)
will return pd.DataFrame(df.one)
将返回
three
0 1
1 2
2 3
Because Pandas reuses the .name
of the already defined Series
.因为 Pandas 重用了已经定义的
Series
的.name
。
Pandas has ways of doing multi-layered column names. Pandas 可以使用多层列名。 There is not so much magic involved, but I wanted to cover this in my answer too since I don't see anyone picking up on this here.
没有太多的魔法,但我也想在我的回答中涵盖这一点,因为我没有看到有人在这里接受这个。
|one |
|one |two |
0 | 4 | 1 |
1 | 5 | 2 |
2 | 6 | 3 |
This is easily achievable by setting columns to lists, like this:这很容易通过将列设置为列表来实现,如下所示:
df.columns = [['one', 'one'], ['one', 'two']]
I'll focus on two things:我将专注于两件事:
OP clearly states OP明确指出
I have the edited column names stored it in a list, but I don't know how to replace the column names.
我将编辑后的列名存储在一个列表中,但我不知道如何替换列名。
I do not want to solve the problem of how to replace '$'
or strip the first character off of each column header.我不想解决如何替换
'$'
或从每个列标题中删除第一个字符的问题。 OP has already done this step. OP 已经完成了这一步。 Instead I want to focus on replacing the existing
columns
object with a new one given a list of replacement column names.相反,我想专注于在给定替换列名称列表的情况下用新的
columns
对象替换现有的列对象。
df.columns = new
where new
is the list of new columns names is as simple as it gets. df.columns = new
其中new
是新列名称的列表,这很简单。 The drawback of this approach is that it requires editing the existing dataframe's columns
attribute and it isn't done inline.这种方法的缺点是它需要编辑现有数据框的
columns
属性,并且不是内联完成的。 I'll show a few ways to perform this via pipelining without editing the existing dataframe.我将展示一些通过流水线执行此操作的方法,而无需编辑现有数据框。
Setup 1设置 1
To focus on the need to rename of replace column names with a pre-existing list, I'll create a new sample dataframe df
with initial column names and unrelated new column names.为了专注于用预先存在的列表重命名替换列名的需要,我将创建一个新的示例数据框
df
,其中包含初始列名和不相关的新列名。
df = pd.DataFrame({'Jack': [1, 2], 'Mahesh': [3, 4], 'Xin': [5, 6]})
new = ['x098', 'y765', 'z432']
df
Jack Mahesh Xin
0 1 3 5
1 2 4 6
Solution 1解决方案 1
pd.DataFrame.rename
It has been said already that if you had a dictionary mapping the old column names to new column names, you could use pd.DataFrame.rename
.已经说过,如果您有一个将旧列名映射到新列名的字典,则可以使用
pd.DataFrame.rename
。
d = {'Jack': 'x098', 'Mahesh': 'y765', 'Xin': 'z432'}
df.rename(columns=d)
x098 y765 z432
0 1 3 5
1 2 4 6
However, you can easily create that dictionary and include it in the call to rename
.但是,您可以轻松地创建该字典并将其包含在对
rename
的调用中。 The following takes advantage of the fact that when iterating over df
, we iterate over each column name.下面利用了这样一个事实,即在迭代
df
时,我们迭代每个列名。
# Given just a list of new column names
df.rename(columns=dict(zip(df, new)))
x098 y765 z432
0 1 3 5
1 2 4 6
This works great if your original column names are unique.如果您的原始列名是唯一的,这将非常有用。 But if they are not, then this breaks down.
但如果他们不是,那么这就会崩溃。
Setup 2设置 2
Non-unique columns非唯一列
df = pd.DataFrame(
[[1, 3, 5], [2, 4, 6]],
columns=['Mahesh', 'Mahesh', 'Xin']
)
new = ['x098', 'y765', 'z432']
df
Mahesh Mahesh Xin
0 1 3 5
1 2 4 6
Solution 2解决方案 2
pd.concat
using the keys
argument pd.concat
使用keys
参数
First, notice what happens when we attempt to use solution 1:首先,注意当我们尝试使用解决方案 1 时会发生什么:
df.rename(columns=dict(zip(df, new)))
y765 y765 z432
0 1 3 5
1 2 4 6
We didn't map the new
list as the column names.我们没有将
new
列表映射为列名。 We ended up repeating y765
.我们最终重复了
y765
。 Instead, we can use the keys
argument of the pd.concat
function while iterating through the columns of df
.相反,我们可以在遍历
df
的列时使用pd.concat
函数的keys
参数。
pd.concat([c for _, c in df.items()], axis=1, keys=new)
x098 y765 z432
0 1 3 5
1 2 4 6
Solution 3解决方案 3
Reconstruct.重建。 This should only be used if you have a single
dtype
for all columns.仅当所有列都有一个
dtype
时才应使用此选项。 Otherwise, you'll end up with dtype
object
for all columns and converting them back requires more dictionary work.否则,您最终会得到所有列的
dtype
object
,并且将它们转换回来需要更多的字典工作。
Single dtype
单一
dtype
pd.DataFrame(df.values, df.index, new)
x098 y765 z432
0 1 3 5
1 2 4 6
Mixed dtype
混合
dtype
pd.DataFrame(df.values, df.index, new).astype(dict(zip(new, df.dtypes)))
x098 y765 z432
0 1 3 5
1 2 4 6
Solution 4解决方案 4
This is a gimmicky trick with transpose
and set_index
.这是
transpose
和set_index
的噱头。 pd.DataFrame.set_index
allows us to set an index inline, but there is no corresponding set_columns
. pd.DataFrame.set_index
允许我们内联设置索引,但没有对应set_columns
。 So we can transpose, then set_index
, and transpose back.所以我们可以转置,然后
set_index
,然后转回。 However, the same single dtype
versus mixed dtype
caveat from solution 3 applies here.但是,解决方案 3 中相同的单一
dtype
与混合dtype
警告在这里适用。
Single dtype
单一
dtype
df.T.set_index(np.asarray(new)).T
x098 y765 z432
0 1 3 5
1 2 4 6
Mixed dtype
混合
dtype
df.T.set_index(np.asarray(new)).T.astype(dict(zip(new, df.dtypes)))
x098 y765 z432
0 1 3 5
1 2 4 6
Solution 5解决方案 5
Use a lambda
in pd.DataFrame.rename
that cycles through each element of new
.在
pd.DataFrame.rename
中使用lambda
循环遍历new
的每个元素。
In this solution, we pass a lambda that takes x
but then ignores it.在这个解决方案中,我们传递了一个接受
x
但随后忽略它的 lambda。 It also takes a y
but doesn't expect it.它也需要一个
y
但并不期望它。 Instead, an iterator is given as a default value and I can then use that to cycle through one at a time without regard to what the value of x
is.相反,将迭代器作为默认值给出,然后我可以使用它一次循环遍历一个,而无需考虑
x
的值是什么。
df.rename(columns=lambda x, y=iter(new): next(y))
x098 y765 z432
0 1 3 5
1 2 4 6
And as pointed out to me by the folks in sopython chat , if I add a *
in between x
and y
, I can protect my y
variable.正如sopython chat中的人们向我指出的那样,如果我在
x
和y
之间添加一个*
,我可以保护我的y
变量。 Though, in this context I don't believe it needs protecting.不过,在这种情况下,我认为它不需要保护。 It is still worth mentioning.
仍然值得一提。
df.rename(columns=lambda x, *, y=iter(new): next(y))
x098 y765 z432
0 1 3 5
1 2 4 6
Let's understand renaming by a small example...让我们通过一个小例子来理解重命名......
Renaming columns using mapping:使用映射重命名列:
df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}) # Creating a df with column name A and B df.rename({"A": "new_a", "B": "new_b"}, axis='columns', inplace =True) # Renaming column A with 'new_a' and B with 'new_b' Output: new_a new_b 0 1 4 1 2 5 2 3 6
Renaming index/Row_Name using mapping:使用映射重命名 index/Row_Name:
df.rename({0: "x", 1: "y", 2: "z"}, axis='index', inplace =True) # Row name are getting replaced by 'x', 'y', and 'z'. Output: new_a new_b x 1 4 y 2 5 z 3 6
Suppose your dataset name is df, and df has.假设您的数据集名称是 df,而 df 有。
df = ['$a', '$b', '$c', '$d', '$e']`
So, to rename these, we would simply do.因此,要重命名这些,我们只需这样做。
df.columns = ['a','b','c','d','e']
Let's say this is your dataframe.假设这是您的数据框。
You can rename the columns using two methods.您可以使用两种方法重命名列。
Using dataframe.columns=[#list]
使用
dataframe.columns=[#list]
df.columns=['a','b','c','d','e']
The limitation of this method is that if one column has to be changed, full column list has to be passed.此方法的局限性在于,如果必须更改一列,则必须传递完整的列列表。 Also, this method is not applicable on index labels.
此外,此方法不适用于索引标签。 For example, if you passed this:
例如,如果你通过了这个:
df.columns = ['a','b','c','d']
This will throw an error.这将引发错误。 Length mismatch: Expected axis has 5 elements, new values have 4 elements.
长度不匹配:预期轴有 5 个元素,新值有 4 个元素。
Another method is the Pandas rename()
method which is used to rename any index, column or row另一种方法是 Pandas
rename()
方法,用于重命名任何索引、列或行
df = df.rename(columns={'$a':'a'})
Similarly, you can change any rows or columns.同样,您可以更改任何行或列。
Many of pandas functions have an inplace parameter.许多 pandas 函数都有一个 inplace 参数。 When setting it True, the transformation applies directly to the dataframe that you are calling it on.
将其设置为 True 时,转换直接应用于您调用它的数据框。 For example:
例如:
df = pd.DataFrame({'$a':[1,2], '$b': [3,4]})
df.rename(columns={'$a': 'a'}, inplace=True)
df.columns
>>> Index(['a', '$b'], dtype='object')
Alternatively, there are cases where you want to preserve the original dataframe.或者,在某些情况下,您希望保留原始数据框。 I have often seen people fall into this case if creating the dataframe is an expensive task.
如果创建数据框是一项昂贵的任务,我经常看到人们陷入这种情况。 For example, if creating the dataframe required querying a snowflake database.
例如,如果创建数据框需要查询雪花数据库。 In this case, just make sure the the inplace parameter is set to False.
在这种情况下,只需确保将 inplace 参数设置为 False。
df = pd.DataFrame({'$a':[1,2], '$b': [3,4]})
df2 = df.rename(columns={'$a': 'a'}, inplace=False)
df.columns
>>> Index(['$a', '$b'], dtype='object')
df2.columns
>>> Index(['a', '$b'], dtype='object')
If these types of transformations are something that you do often, you could also look into a number of different pandas GUI tools.如果这些类型的转换是您经常做的事情,您还可以查看许多不同的 pandas GUI 工具。 I'm the creator of one called Mito .
我是一个叫做Mito的创造者。 Its a spreadsheet that automatically converts your edits to python code.
它是一个电子表格,可自动将您的编辑转换为 python 代码。
df.rename(index=str, columns={'A':'a', 'B':'b'})
If you've got the dataframe, df.columns dumps everything into a list you can manipulate and then reassign into your dataframe as the names of columns...如果您有数据框,则 df.columns 会将所有内容转储到您可以操作的列表中,然后将其作为列名重新分配到您的数据框中...
columns = df.columns
columns = [row.replace("$", "") for row in columns]
df.rename(columns=dict(zip(columns, things)), inplace=True)
df.head() # To validate the output
Best way?最好的办法? I don't know.
我不知道。 A way - yes.
一种方式——是的。
A better way of evaluating all the main techniques put forward in the answers to the question is below using cProfile to gage memory and execution time.评估问题答案中提出的所有主要技术的更好方法是使用 cProfile 来衡量内存和执行时间。 @kadee, @kaitlyn, and @eumiro had the functions with the fastest execution times - though these functions are so fast we're comparing the rounding of 0.000 and 0.001 seconds for all the answers.
@kadee、@kaitlyn 和 @eumiro 具有执行时间最快的函数 - 尽管这些函数非常快,但我们正在比较所有答案的 0.000 和 0.001 秒的舍入。 Moral: my answer above likely isn't the 'best' way.
道德:我上面的答案可能不是“最好”的方式。
import pandas as pd
import cProfile, pstats, re
old_names = ['$a', '$b', '$c', '$d', '$e']
new_names = ['a', 'b', 'c', 'd', 'e']
col_dict = {'$a': 'a', '$b': 'b', '$c': 'c', '$d': 'd', '$e': 'e'}
df = pd.DataFrame({'$a':[1, 2], '$b': [10, 20], '$c': ['bleep', 'blorp'], '$d': [1, 2], '$e': ['texa$', '']})
df.head()
def eumiro(df, nn):
df.columns = nn
# This direct renaming approach is duplicated in methodology in several other answers:
return df
def lexual1(df):
return df.rename(columns=col_dict)
def lexual2(df, col_dict):
return df.rename(columns=col_dict, inplace=True)
def Panda_Master_Hayden(df):
return df.rename(columns=lambda x: x[1:], inplace=True)
def paulo1(df):
return df.rename(columns=lambda x: x.replace('$', ''))
def paulo2(df):
return df.rename(columns=lambda x: x.replace('$', ''), inplace=True)
def migloo(df, on, nn):
return df.rename(columns=dict(zip(on, nn)), inplace=True)
def kadee(df):
return df.columns.str.replace('$', '')
def awo(df):
columns = df.columns
columns = [row.replace("$", "") for row in columns]
return df.rename(columns=dict(zip(columns, '')), inplace=True)
def kaitlyn(df):
df.columns = [col.strip('$') for col in df.columns]
return df
print 'eumiro'
cProfile.run('eumiro(df, new_names)')
print 'lexual1'
cProfile.run('lexual1(df)')
print 'lexual2'
cProfile.run('lexual2(df, col_dict)')
print 'andy hayden'
cProfile.run('Panda_Master_Hayden(df)')
print 'paulo1'
cProfile.run('paulo1(df)')
print 'paulo2'
cProfile.run('paulo2(df)')
print 'migloo'
cProfile.run('migloo(df, old_names, new_names)')
print 'kadee'
cProfile.run('kadee(df)')
print 'awo'
cProfile.run('awo(df)')
print 'kaitlyn'
cProfile.run('kaitlyn(df)')
df = pd.DataFrame({'$a': [1], '$b': [1], '$c': [1], '$d': [1], '$e': [1]})
If your new list of columns is in the same order as the existing columns, the assignment is simple:如果您的新列列表与现有列的顺序相同,则分配很简单:
new_cols = ['a', 'b', 'c', 'd', 'e']
df.columns = new_cols
>>> df
a b c d e
0 1 1 1 1 1
If you had a dictionary keyed on old column names to new column names, you could do the following:如果您有一个将旧列名键入新列名的字典,则可以执行以下操作:
d = {'$a': 'a', '$b': 'b', '$c': 'c', '$d': 'd', '$e': 'e'}
df.columns = df.columns.map(lambda col: d[col]) # Or `.map(d.get)` as pointed out by @PiRSquared.
>>> df
a b c d e
0 1 1 1 1 1
If you don't have a list or dictionary mapping, you could strip the leading $
symbol via a list comprehension:如果您没有列表或字典映射,则可以通过列表推导去除前导
$
符号:
df.columns = [col[1:] if col[0] == '$' else col for col in df]
Another way we could replace the original column labels is by stripping the unwanted characters (here '$') from the original column labels.我们可以替换原始列标签的另一种方法是从原始列标签中删除不需要的字符(此处为“$”)。
This could have been done by running a for loop over df.columns and appending the stripped columns to df.columns.这可以通过在 df.columns 上运行for循环并将剥离的列附加到 df.columns 来完成。
Instead, we can do this neatly in a single statement by using list comprehension like below:相反,我们可以使用下面的列表推导在单个语句中巧妙地做到这一点:
df.columns = [col.strip('$') for col in df.columns]
( strip
method in Python strips the given character from beginning and end of the string.) ( Python 中的
strip
方法从字符串的开头和结尾剥离给定的字符。)
It is real simple.这真的很简单。 Just use:
只需使用:
df.columns = ['Name1', 'Name2', 'Name3'...]
And it will assign the column names by the order you put them in.它将按照您输入的顺序分配列名。
If you already have a list for the new column names, you can try this:如果您已经有了新列名的列表,可以试试这个:
new_cols = ['a', 'b', 'c', 'd', 'e']
new_names_map = {df.columns[i]:new_cols[i] for i in range(len(new_cols))}
df.rename(new_names_map, axis=1, inplace=True)
# This way it will work
import pandas as pd
# Define a dictionary
rankings = {'test': ['a'],
'odi': ['E'],
't20': ['P']}
# Convert the dictionary into DataFrame
rankings_pd = pd.DataFrame(rankings)
# Before renaming the columns
print(rankings_pd)
rankings_pd.rename(columns = {'test':'TEST'}, inplace = True)
你可以使用str.slice
:
df.columns = df.columns.str.slice(1)
Another option is to rename using a regular expression:另一种选择是使用正则表达式重命名:
import pandas as pd
import re
df = pd.DataFrame({'$a':[1,2], '$b':[3,4], '$c':[5,6]})
df = df.rename(columns=lambda x: re.sub('\$','',x))
>>> df
a b c
0 1 3 5
1 2 4 6
My method is generic wherein you can add additional delimiters by comma separating delimiters=
variable and future-proof it.我的方法是通用的,您可以通过逗号分隔
delimiters=
variable 添加其他分隔符并使其面向未来。
Working Code:工作代码:
import pandas as pd
import re
df = pd.DataFrame({'$a':[1,2], '$b': [3,4],'$c':[5,6], '$d': [7,8], '$e': [9,10]})
delimiters = '$'
matchPattern = '|'.join(map(re.escape, delimiters))
df.columns = [re.split(matchPattern, i)[1] for i in df.columns ]
Output:输出:
>>> df
$a $b $c $d $e
0 1 3 5 7 9
1 2 4 6 8 10
>>> df
a b c d e
0 1 3 5 7 9
1 2 4 6 8 10
Note that the approaches in previous answers do not work for a MultiIndex .请注意,先前答案中的方法不适用于MultiIndex 。 For a MultiIndex , you need to do something like the following:
对于MultiIndex ,您需要执行以下操作:
>>> df = pd.DataFrame({('$a','$x'):[1,2], ('$b','$y'): [3,4], ('e','f'):[5,6]})
>>> df
$a $b e
$x $y f
0 1 3 5
1 2 4 6
>>> rename = {('$a','$x'):('a','x'), ('$b','$y'):('b','y')}
>>> df.columns = pandas.MultiIndex.from_tuples([
rename.get(item, item) for item in df.columns.tolist()])
>>> df
a b e
x y f
0 1 3 5
1 2 4 6
If you have to deal with loads of columns named by the providing system out of your control, I came up with the following approach that is a combination of a general approach and specific replacements in one go.如果您必须处理您无法控制的由提供系统命名的大量列,我想出了以下方法,它是一种通用方法和特定替换的组合。
First create a dictionary from the dataframe column names using regular expressions in order to throw away certain appendixes of column names and then add specific replacements to the dictionary to name core columns as expected later in the receiving database.首先使用正则表达式从数据框列名创建一个字典,以便丢弃列名的某些附录,然后将特定替换添加到字典中,以便稍后在接收数据库中按预期命名核心列。
This is then applied to the dataframe in one go.然后将其一次性应用于数据帧。
dict = dict(zip(df.columns, df.columns.str.replace('(:S$|:C1$|:L$|:D$|\.Serial:L$)', '')))
dict['brand_timeseries:C1'] = 'BTS'
dict['respid:L'] = 'RespID'
dict['country:C1'] = 'CountryID'
dict['pim1:D'] = 'pim_actual'
df.rename(columns=dict, inplace=True)
如果您只想删除“$”符号,请使用以下代码
df.columns = pd.Series(df.columns.str.replace("$", ""))
In addition to the solution already provided, you can replace all the columns while you are reading the file.除了已经提供的解决方案之外,您还可以在读取文件时替换所有列。 We can use
names
and header=0
to do that.我们可以使用
names
和header=0
来做到这一点。
First, we create a list of the names that we like to use as our column names:首先,我们创建一个我们喜欢用作列名的名称列表:
import pandas as pd
ufo_cols = ['city', 'color reported', 'shape reported', 'state', 'time']
ufo.columns = ufo_cols
ufo = pd.read_csv('link to the file you are using', names = ufo_cols, header = 0)
In this case, all the column names will be replaced with the names you have in your list.在这种情况下,所有列名都将替换为您在列表中的名称。
Here's a nifty little function I like to use to cut down on typing:这是我喜欢用来减少打字的一个漂亮的小功能:
def rename(data, oldnames, newname):
if type(oldnames) == str: # Input can be a string or list of strings
oldnames = [oldnames] # When renaming multiple columns
newname = [newname] # Make sure you pass the corresponding list of new names
i = 0
for name in oldnames:
oldvar = [c for c in data.columns if name in c]
if len(oldvar) == 0:
raise ValueError("Sorry, couldn't find that column in the dataset")
if len(oldvar) > 1: # Doesn't have to be an exact match
print("Found multiple columns that matched " + str(name) + ": ")
for c in oldvar:
print(str(oldvar.index(c)) + ": " + str(c))
ind = input('Please enter the index of the column you would like to rename: ')
oldvar = oldvar[int(ind)]
if len(oldvar) == 1:
oldvar = oldvar[0]
data = data.rename(columns = {oldvar : newname[i]})
i += 1
return data
Here is an example of how it works:这是它如何工作的示例:
In [2]: df = pd.DataFrame(np.random.randint(0, 10, size=(10, 4)), columns = ['col1', 'col2', 'omg', 'idk'])
# First list = existing variables
# Second list = new names for those variables
In [3]: df = rename(df, ['col', 'omg'],['first', 'ohmy'])
Found multiple columns that matched col:
0: col1
1: col2
Please enter the index of the column you would like to rename: 0
In [4]: df.columns
Out[5]: Index(['first', 'col2', 'ohmy', 'idk'], dtype='object')
Assuming you can use a regular expression, this solution removes the need of manual encoding using a regular expression:假设您可以使用正则表达式,此解决方案无需使用正则表达式进行手动编码:
import pandas as pd
import re
srch = re.compile(r"\w+")
data = pd.read_csv("CSV_FILE.csv")
cols = data.columns
new_cols = list(map(lambda v:v.group(), (list(map(srch.search, cols)))))
data.columns = new_cols
I needed to rename features for XGBoost, and it didn't like any of these:我需要重命名 XGBoost 的功能,但它不喜欢以下任何一个:
import re
regex = r"[!\"#$%&'()*+,\-.\/:;<=>?@[\\\]^_`{|}~ ]+"
X_trn.columns = X_trn.columns.str.replace(regex, '_', regex=True)
X_tst.columns = X_tst.columns.str.replace(regex, '_', regex=True)
My one line answer is df.columns = df_new_cols
is the best one with 1/3rd processing time.我的单行答案是
df.columns = df_new_cols
是处理时间为 1/3 的最佳答案。
timeit
Comparison: df has 7 columns. timeit
比较:df 有 7 列。 I am trying to change a few of the names.我正在尝试更改一些名称。
%timeit df.rename(columns={old_col:new_col for (old_col,new_col) in zip(df_old_cols,df_new_cols)},inplace=True)
214 µs ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df.rename(columns=dict(zip(df_old_cols,df_new_cols)),inplace=True)
212 µs ± 7.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df.columns = df_new_cols
72.9 µs ± 17.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.