简体   繁体   English

重命名 Pandas 中的列名

[英]Renaming column names in Pandas

I want to change the column labels of a Pandas DataFrame from我想从

['$a', '$b', '$c', '$d', '$e']

to

['a', 'b', 'c', 'd', 'e']

RENAME SPECIFIC COLUMNS重命名特定列

Use the df.rename() function and refer the columns to be renamed.使用df.rename()函数并引用要重命名的列。 Not all the columns have to be renamed:并非所有列都必须重命名:

df = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'})
# Or rename the existing DataFrame (rather than creating a copy) 
df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}, inplace=True)

Minimal Code Example最小代码示例

df = pd.DataFrame('x', index=range(3), columns=list('abcde'))
df

   a  b  c  d  e
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

The following methods all work and produce the same output:以下方法都有效并产生相同的输出:

df2 = df.rename({'a': 'X', 'b': 'Y'}, axis=1)  # new method
df2 = df.rename({'a': 'X', 'b': 'Y'}, axis='columns')
df2 = df.rename(columns={'a': 'X', 'b': 'Y'})  # old method  

df2

   X  Y  c  d  e
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

Remember to assign the result back, as the modification is not-inplace.请记住将结果分配回去,因为修改不是就地的。 Alternatively, specify inplace=True :或者,指定inplace=True

df.rename({'a': 'X', 'b': 'Y'}, axis=1, inplace=True)
df

   X  Y  c  d  e
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x
 

From v0.25, you can also specify errors='raise' to raise errors if an invalid column-to-rename is specified.从 v0.25 开始,如果指定了要重命名的无效列,您还可以指定errors='raise'来引发错误。 See v0.25 rename() docs .请参阅v0.25 rename()文档


REASSIGN COLUMN HEADERS重新分配列标题

Use df.set_axis() with axis=1 and inplace=False (to return a copy).df.set_axis()axis=1inplace=False一起使用(返回副本)。

df2 = df.set_axis(['V', 'W', 'X', 'Y', 'Z'], axis=1, inplace=False)
df2

   V  W  X  Y  Z
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

This returns a copy, but you can modify the DataFrame in-place by setting inplace=True (this is the default behaviour for versions <=0.24 but is likely to change in the future).这将返回一个副本,但您可以通过设置 inplace inplace=True修改 DataFrame(这是版本 <=0.24 的默​​认行为,但将来可能会更改)。

You can also assign headers directly:您也可以直接分配标题:

df.columns = ['V', 'W', 'X', 'Y', 'Z']
df

   V  W  X  Y  Z
0  x  x  x  x  x
1  x  x  x  x  x
2  x  x  x  x  x

Just assign it to the .columns attribute:只需将其分配给.columns属性:

>>> df = pd.DataFrame({'$a':[1,2], '$b': [10,20]})
>>> df
   $a  $b
0   1  10
1   2  20

>>> df.columns = ['a', 'b']
>>> df
   a   b
0  1  10
1  2  20

The rename method can take a function , for example: rename方法可以带一个函数,例如:

In [11]: df.columns
Out[11]: Index([u'$a', u'$b', u'$c', u'$d', u'$e'], dtype=object)

In [12]: df.rename(columns=lambda x: x[1:], inplace=True)

In [13]: df.columns
Out[13]: Index([u'a', u'b', u'c', u'd', u'e'], dtype=object)

使用文本数据中所述:

df.columns = df.columns.str.replace('$', '')

Pandas 0.21+ Answer熊猫 0.21+ 答案

There have been some significant updates to column renaming in version 0.21. 0.21 版中对列重命名进行了一些重大更新。

  • The rename method has added the axis parameter which may be set to columns or 1 . rename方法添加了axis参数,可以设置为columns1 This update makes this method match the rest of the pandas API.此更新使此方法与 pandas API 的其余部分相匹配。 It still has the index and columns parameters but you are no longer forced to use them.它仍然具有indexcolumns参数,但您不再被迫使用它们。
  • The set_axis method with the inplace set to False enables you to rename all the index or column labels with a list.inplace设置为Falseset_axis方法使您能够使用列表重命名所有索引或列标签。

Examples for Pandas 0.21+ Pandas 0.21+ 的示例

Construct sample DataFrame:构建示例 DataFrame:

df = pd.DataFrame({'$a':[1,2], '$b': [3,4], 
                   '$c':[5,6], '$d':[7,8], 
                   '$e':[9,10]})

   $a  $b  $c  $d  $e
0   1   3   5   7   9
1   2   4   6   8  10

Using rename with axis='columns' or axis=1renameaxis='columns'axis=1一起使用

df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis='columns')

or或者

df.rename({'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'}, axis=1)

Both result in the following:两者都导致以下结果:

   a  b  c  d   e
0  1  3  5  7   9
1  2  4  6  8  10

It is still possible to use the old method signature:仍然可以使用旧的方法签名:

df.rename(columns={'$a':'a', '$b':'b', '$c':'c', '$d':'d', '$e':'e'})

The rename function also accepts functions that will be applied to each column name. rename函数还接受将应用于每个列名的函数。

df.rename(lambda x: x[1:], axis='columns')

or或者

df.rename(lambda x: x[1:], axis=1)

Using set_axis with a list and inplace=Falseset_axis与列表和 inplace inplace=False一起使用

You can supply a list to the set_axis method that is equal in length to the number of columns (or index).您可以为set_axis方法提供一个长度等于列数(或索引)的列表。 Currently, inplace defaults to True , but inplace will be defaulted to False in future releases.目前, inplace默认为True ,但在未来的版本中, inplace将默认为False

df.set_axis(['a', 'b', 'c', 'd', 'e'], axis='columns', inplace=False)

or或者

df.set_axis(['a', 'b', 'c', 'd', 'e'], axis=1, inplace=False)

Why not use df.columns = ['a', 'b', 'c', 'd', 'e'] ?为什么不使用df.columns = ['a', 'b', 'c', 'd', 'e']

There is nothing wrong with assigning columns directly like this.像这样直接分配列并没有错。 It is a perfectly good solution.这是一个非常好的解决方案。

The advantage of using set_axis is that it can be used as part of a method chain and that it returns a new copy of the DataFrame.使用set_axis的优点是它可以用作方法链的一部分,并且它返回 DataFrame 的新副本。 Without it, you would have to store your intermediate steps of the chain to another variable before reassigning the columns.没有它,在重新分配列之前,您必须将链的中间步骤存储到另一个变量中。

# new for pandas 0.21+
df.some_method1()
  .some_method2()
  .set_axis()
  .some_method3()

# old way
df1 = df.some_method1()
        .some_method2()
df1.columns = columns
df1.some_method3()

Since you only want to remove the $ sign in all column names, you could just do:由于您只想删除所有列名中的 $ 符号,您可以这样做:

df = df.rename(columns=lambda x: x.replace('$', ''))

OR或者

df.rename(columns=lambda x: x.replace('$', ''), inplace=True)

在 Pandas 中重命名列是一项简单的任务。

df.rename(columns={'$a': 'a', '$b': 'b', '$c': 'c', '$d': 'd', '$e': 'e'}, inplace=True)
df.columns = ['a', 'b', 'c', 'd', 'e']

它将按照您提供的顺序将现有名称替换为您提供的名称。

Use:利用:

old_names = ['$a', '$b', '$c', '$d', '$e'] 
new_names = ['a', 'b', 'c', 'd', 'e']
df.rename(columns=dict(zip(old_names, new_names)), inplace=True)

This way you can manually edit the new_names as you wish.这样,您可以根据需要手动编辑new_names It works great when you need to rename only a few columns to correct misspellings, accents, remove special characters, etc.当您只需要重命名几列以纠正拼写错误、重音符号、删除特殊字符等时,它非常有用。

Column names vs Names of Series列名与系列名称

I would like to explain a bit what happens behind the scenes.我想解释一下幕后发生的事情。

Dataframes are a set of Series.数据框是一组系列。

Series in turn are an extension of a numpy.array .系列又是numpy.array的扩展。

numpy.array s have a property .name . numpy.array有一个属性.name

This is the name of the series.这是该系列的名称。 It is seldom that Pandas respects this attribute, but it lingers in places and can be used to hack some Pandas behaviors. Pandas 很少尊重此属性,但它在某些地方徘徊,可用于破解 Pandas 的某些行为。

Naming the list of columns命名列列表

A lot of answers here talks about the df.columns attribute being a list when in fact it is a Series .这里的很多答案都谈到df.columns属性是一个list ,而实际上它是一个Series This means it has a .name attribute.这意味着它有一个.name属性。

This is what happens if you decide to fill in the name of the columns Series :如果您决定填写Series列的名称,就会发生这种情况:

df.columns = ['column_one', 'column_two']
df.columns.names = ['name of the list of columns']
df.index.names = ['name of the index']

name of the list of columns     column_one  column_two
name of the index
0                                    4           1
1                                    5           2
2                                    6           3

Note that the name of the index always comes one column lower.请注意,索引的名称总是低一列。

Artefacts that linger挥之不去的文物

The .name attribute lingers on sometimes. .name属性有时会持续存在。 If you set df.columns = ['one', 'two'] then the df.one.name will be 'one' .如果您设置df.columns = ['one', 'two']那么df.one.name将是'one'

If you set df.one.name = 'three' then df.columns will still give you ['one', 'two'] , and df.one.name will give you 'three' .如果你设置df.one.name = 'three'那么df.columns仍然会给你['one', 'two'] ,并且df.one.name会给你'three'

BUT

pd.DataFrame(df.one) will return pd.DataFrame(df.one)将返回

    three
0       1
1       2
2       3

Because Pandas reuses the .name of the already defined Series .因为 Pandas 重用了已经定义的Series.name

Multi-level column names多级列名

Pandas has ways of doing multi-layered column names. Pandas 可以使用多层列名。 There is not so much magic involved, but I wanted to cover this in my answer too since I don't see anyone picking up on this here.没有太多的魔法,但我也想在我的回答中涵盖这一点,因为我没有看到有人在这里接受这个。

    |one            |
    |one      |two  |
0   |  4      |  1  |
1   |  5      |  2  |
2   |  6      |  3  |

This is easily achievable by setting columns to lists, like this:这很容易通过将列设置为列表来实现,如下所示:

df.columns = [['one', 'one'], ['one', 'two']]

One line or Pipeline solutions一条线或管道解决方案

I'll focus on two things:我将专注于两件事:

  1. OP clearly states OP明确指出

    I have the edited column names stored it in a list, but I don't know how to replace the column names.我将编辑后的列名存储在一个列表中,但我不知道如何替换列名。

    I do not want to solve the problem of how to replace '$' or strip the first character off of each column header.我不想解决如何替换'$'或从每个列标题中删除第一个字符的问题。 OP has already done this step. OP 已经完成了这一步。 Instead I want to focus on replacing the existing columns object with a new one given a list of replacement column names.相反,我想专注于在给定替换列名称列表的情况下用新的columns对象替换现有的列对象。

  2. df.columns = new where new is the list of new columns names is as simple as it gets. df.columns = new其中new是新列名称的列表,这很简单。 The drawback of this approach is that it requires editing the existing dataframe's columns attribute and it isn't done inline.这种方法的缺点是它需要编辑现有数据框的columns属性,并且不是内联完成的。 I'll show a few ways to perform this via pipelining without editing the existing dataframe.我将展示一些通过流水线执行此操作的方法,而无需编辑现有数据框。


Setup 1设置 1
To focus on the need to rename of replace column names with a pre-existing list, I'll create a new sample dataframe df with initial column names and unrelated new column names.为了专注于用预先存在的列表重命名替换列名的需要,我将创建一个新的示例数据框df ,其中包含初始列名和不相关的新列名。

df = pd.DataFrame({'Jack': [1, 2], 'Mahesh': [3, 4], 'Xin': [5, 6]})
new = ['x098', 'y765', 'z432']

df

   Jack  Mahesh  Xin
0     1       3    5
1     2       4    6

Solution 1解决方案 1
pd.DataFrame.rename

It has been said already that if you had a dictionary mapping the old column names to new column names, you could use pd.DataFrame.rename .已经说过,如果您有一个将旧列名映射到新列名的字典,则可以使用pd.DataFrame.rename

d = {'Jack': 'x098', 'Mahesh': 'y765', 'Xin': 'z432'}
df.rename(columns=d)

   x098  y765  z432
0     1     3     5
1     2     4     6

However, you can easily create that dictionary and include it in the call to rename .但是,您可以轻松地创建该字典并将其包含在对rename的调用中。 The following takes advantage of the fact that when iterating over df , we iterate over each column name.下面利用了这样一个事实,即在迭代df时,我们迭代每个列名。

# Given just a list of new column names
df.rename(columns=dict(zip(df, new)))

   x098  y765  z432
0     1     3     5
1     2     4     6

This works great if your original column names are unique.如果您的原始列名是唯一的,这将非常有用。 But if they are not, then this breaks down.但如果他们不是,那么这就会崩溃。


Setup 2设置 2
Non-unique columns非唯一列

df = pd.DataFrame(
    [[1, 3, 5], [2, 4, 6]],
    columns=['Mahesh', 'Mahesh', 'Xin']
)
new = ['x098', 'y765', 'z432']

df

   Mahesh  Mahesh  Xin
0       1       3    5
1       2       4    6

Solution 2解决方案 2
pd.concat using the keys argument pd.concat使用keys参数

First, notice what happens when we attempt to use solution 1:首先,注意当我们尝试使用解决方案 1 时会发生什么:

df.rename(columns=dict(zip(df, new)))

   y765  y765  z432
0     1     3     5
1     2     4     6

We didn't map the new list as the column names.我们没有将new列表映射为列名。 We ended up repeating y765 .我们最终重复了y765 Instead, we can use the keys argument of the pd.concat function while iterating through the columns of df .相反,我们可以在遍历df的列时使用pd.concat函数的keys参数。

pd.concat([c for _, c in df.items()], axis=1, keys=new) 

   x098  y765  z432
0     1     3     5
1     2     4     6

Solution 3解决方案 3
Reconstruct.重建。 This should only be used if you have a single dtype for all columns.仅当所有列都有一个dtype时才应使用此选项。 Otherwise, you'll end up with dtype object for all columns and converting them back requires more dictionary work.否则,您最终会得到所有列的dtype object ,并且将它们转换回来需要更多的字典工作。

Single dtype单一dtype

pd.DataFrame(df.values, df.index, new)

   x098  y765  z432
0     1     3     5
1     2     4     6

Mixed dtype混合dtype

pd.DataFrame(df.values, df.index, new).astype(dict(zip(new, df.dtypes)))

   x098  y765  z432
0     1     3     5
1     2     4     6

Solution 4解决方案 4
This is a gimmicky trick with transpose and set_index .这是transposeset_index的噱头。 pd.DataFrame.set_index allows us to set an index inline, but there is no corresponding set_columns . pd.DataFrame.set_index允许我们内联设置索引,但没有对应set_columns So we can transpose, then set_index , and transpose back.所以我们可以转置,然后set_index ,然后转回。 However, the same single dtype versus mixed dtype caveat from solution 3 applies here.但是,解决方案 3 中相同的单一dtype与混合dtype警告在这里适用。

Single dtype单一dtype

df.T.set_index(np.asarray(new)).T

   x098  y765  z432
0     1     3     5
1     2     4     6

Mixed dtype混合dtype

df.T.set_index(np.asarray(new)).T.astype(dict(zip(new, df.dtypes)))

   x098  y765  z432
0     1     3     5
1     2     4     6

Solution 5解决方案 5
Use a lambda in pd.DataFrame.rename that cycles through each element of new .pd.DataFrame.rename中使用lambda循环遍历new的每个元素。
In this solution, we pass a lambda that takes x but then ignores it.在这个解决方案中,我们传递了一个接受x但随后忽略它的 lambda。 It also takes a y but doesn't expect it.它也需要一个y但并不期望它。 Instead, an iterator is given as a default value and I can then use that to cycle through one at a time without regard to what the value of x is.相反,将迭代器作为默认值给出,然后我可以使用它一次循环遍历一个,而无需考虑x的值是什么。

df.rename(columns=lambda x, y=iter(new): next(y))

   x098  y765  z432
0     1     3     5
1     2     4     6

And as pointed out to me by the folks in sopython chat , if I add a * in between x and y , I can protect my y variable.正如sopython chat中的人们向我指出的那样,如果我在xy之间添加一个* ,我可以保护我的y变量。 Though, in this context I don't believe it needs protecting.不过,在这种情况下,我认为它不需要保护。 It is still worth mentioning.仍然值得一提。

df.rename(columns=lambda x, *, y=iter(new): next(y))

   x098  y765  z432
0     1     3     5
1     2     4     6

Let's understand renaming by a small example...让我们通过一个小例子来理解重命名......

  1. Renaming columns using mapping:使用映射重命名列:

     df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]}) # Creating a df with column name A and B df.rename({"A": "new_a", "B": "new_b"}, axis='columns', inplace =True) # Renaming column A with 'new_a' and B with 'new_b' Output: new_a new_b 0 1 4 1 2 5 2 3 6
  2. Renaming index/Row_Name using mapping:使用映射重命名 index/Row_Name:

     df.rename({0: "x", 1: "y", 2: "z"}, axis='index', inplace =True) # Row name are getting replaced by 'x', 'y', and 'z'. Output: new_a new_b x 1 4 y 2 5 z 3 6

Suppose your dataset name is df, and df has.假设您的数据集名称是 df,而 df 有。

df = ['$a', '$b', '$c', '$d', '$e']`

So, to rename these, we would simply do.因此,要重命名这些,我们只需这样做。

df.columns = ['a','b','c','d','e']

Let's say this is your dataframe.假设这是您的数据框。

在此处输入图像描述

You can rename the columns using two methods.您可以使用两种方法重命名列。

  1. Using dataframe.columns=[#list]使用dataframe.columns=[#list]

     df.columns=['a','b','c','d','e']

    在此处输入图像描述

    The limitation of this method is that if one column has to be changed, full column list has to be passed.此方法的局限性在于,如果必须更改一列,则必须传递完整的列列表。 Also, this method is not applicable on index labels.此外,此方法不适用于索引标签。 For example, if you passed this:例如,如果你通过了这个:

     df.columns = ['a','b','c','d']

    This will throw an error.这将引发错误。 Length mismatch: Expected axis has 5 elements, new values have 4 elements.长度不匹配:预期轴有 5 个元素,新值有 4 个元素。

  2. Another method is the Pandas rename() method which is used to rename any index, column or row另一种方法是 Pandas rename()方法,用于重命名任何索引、列或行

    df = df.rename(columns={'$a':'a'})

    在此处输入图像描述

Similarly, you can change any rows or columns.同样,您可以更改任何行或列。

Many of pandas functions have an inplace parameter.许多 pandas 函数都有一个 inplace 参数。 When setting it True, the transformation applies directly to the dataframe that you are calling it on.将其设置为 True 时,转换直接应用于您调用它的数据框。 For example:例如:

df = pd.DataFrame({'$a':[1,2], '$b': [3,4]})
df.rename(columns={'$a': 'a'}, inplace=True)
df.columns

>>> Index(['a', '$b'], dtype='object')

Alternatively, there are cases where you want to preserve the original dataframe.或者,在某些情况下,您希望保留原始数据框。 I have often seen people fall into this case if creating the dataframe is an expensive task.如果创建数据框是一项昂贵的任务,我经常看到人们陷入这种情况。 For example, if creating the dataframe required querying a snowflake database.例如,如果创建数据框需要查询雪花数据库。 In this case, just make sure the the inplace parameter is set to False.在这种情况下,只需确保将 inplace 参数设置为 False。

df = pd.DataFrame({'$a':[1,2], '$b': [3,4]})
df2 = df.rename(columns={'$a': 'a'}, inplace=False)
df.columns
    
>>> Index(['$a', '$b'], dtype='object')

df2.columns

>>> Index(['a', '$b'], dtype='object')

If these types of transformations are something that you do often, you could also look into a number of different pandas GUI tools.如果这些类型的转换是您经常做的事情,您还可以查看许多不同的 pandas GUI 工具。 I'm the creator of one called Mito .我是一个叫做Mito的创造者。 Its a spreadsheet that automatically converts your edits to python code.它是一个电子表格,可自动将您的编辑转换为 python 代码。

df.rename(index=str, columns={'A':'a', 'B':'b'})

pandas.DataFrame.rename

If you've got the dataframe, df.columns dumps everything into a list you can manipulate and then reassign into your dataframe as the names of columns...如果您有数据框,则 df.columns 会将所有内容转储到您可以操作的列表中,然后将其作为列名重新分配到您的数据框中...

columns = df.columns
columns = [row.replace("$", "") for row in columns]
df.rename(columns=dict(zip(columns, things)), inplace=True)
df.head() # To validate the output

Best way?最好的办法? I don't know.我不知道。 A way - yes.一种方式——是的。

A better way of evaluating all the main techniques put forward in the answers to the question is below using cProfile to gage memory and execution time.评估问题答案中提出的所有主要技术的更好方法是使用 cProfile 来衡量内存和执行时间。 @kadee, @kaitlyn, and @eumiro had the functions with the fastest execution times - though these functions are so fast we're comparing the rounding of 0.000 and 0.001 seconds for all the answers. @kadee、@kaitlyn 和 @eumiro 具有执行时间最快的函数 - 尽管这些函数非常快,但我们正在比较所有答案的 0.000 和 0.001 秒的舍入。 Moral: my answer above likely isn't the 'best' way.道德:我上面的答案可能不是“最好”的方式。

import pandas as pd
import cProfile, pstats, re

old_names = ['$a', '$b', '$c', '$d', '$e']
new_names = ['a', 'b', 'c', 'd', 'e']
col_dict = {'$a': 'a', '$b': 'b', '$c': 'c', '$d': 'd', '$e': 'e'}

df = pd.DataFrame({'$a':[1, 2], '$b': [10, 20], '$c': ['bleep', 'blorp'], '$d': [1, 2], '$e': ['texa$', '']})

df.head()

def eumiro(df, nn):
    df.columns = nn
    # This direct renaming approach is duplicated in methodology in several other answers:
    return df

def lexual1(df):
    return df.rename(columns=col_dict)

def lexual2(df, col_dict):
    return df.rename(columns=col_dict, inplace=True)

def Panda_Master_Hayden(df):
    return df.rename(columns=lambda x: x[1:], inplace=True)

def paulo1(df):
    return df.rename(columns=lambda x: x.replace('$', ''))

def paulo2(df):
    return df.rename(columns=lambda x: x.replace('$', ''), inplace=True)

def migloo(df, on, nn):
    return df.rename(columns=dict(zip(on, nn)), inplace=True)

def kadee(df):
    return df.columns.str.replace('$', '')

def awo(df):
    columns = df.columns
    columns = [row.replace("$", "") for row in columns]
    return df.rename(columns=dict(zip(columns, '')), inplace=True)

def kaitlyn(df):
    df.columns = [col.strip('$') for col in df.columns]
    return df

print 'eumiro'
cProfile.run('eumiro(df, new_names)')
print 'lexual1'
cProfile.run('lexual1(df)')
print 'lexual2'
cProfile.run('lexual2(df, col_dict)')
print 'andy hayden'
cProfile.run('Panda_Master_Hayden(df)')
print 'paulo1'
cProfile.run('paulo1(df)')
print 'paulo2'
cProfile.run('paulo2(df)')
print 'migloo'
cProfile.run('migloo(df, old_names, new_names)')
print 'kadee'
cProfile.run('kadee(df)')
print 'awo'
cProfile.run('awo(df)')
print 'kaitlyn'
cProfile.run('kaitlyn(df)')
df = pd.DataFrame({'$a': [1], '$b': [1], '$c': [1], '$d': [1], '$e': [1]})

If your new list of columns is in the same order as the existing columns, the assignment is simple:如果您的新列列表与现有列的顺序相同,则分配很简单:

new_cols = ['a', 'b', 'c', 'd', 'e']
df.columns = new_cols
>>> df
   a  b  c  d  e
0  1  1  1  1  1

If you had a dictionary keyed on old column names to new column names, you could do the following:如果您有一个将旧列名键入新列名的字典,则可以执行以下操作:

d = {'$a': 'a', '$b': 'b', '$c': 'c', '$d': 'd', '$e': 'e'}
df.columns = df.columns.map(lambda col: d[col])  # Or `.map(d.get)` as pointed out by @PiRSquared.
>>> df
   a  b  c  d  e
0  1  1  1  1  1

If you don't have a list or dictionary mapping, you could strip the leading $ symbol via a list comprehension:如果您没有列表或字典映射,则可以通过列表推导去除前导$符号:

df.columns = [col[1:] if col[0] == '$' else col for col in df]

Another way we could replace the original column labels is by stripping the unwanted characters (here '$') from the original column labels.我们可以替换原始列标签的另一种方法是从原始列标签中删除不需要的字符(此处为“$”)。

This could have been done by running a for loop over df.columns and appending the stripped columns to df.columns.这可以通过在 df.columns 上运行for循环并将剥离的列附加到 df.columns 来完成。

Instead, we can do this neatly in a single statement by using list comprehension like below:相反,我们可以使用下面的列表推导在单个语句中巧妙地做到这一点:

df.columns = [col.strip('$') for col in df.columns]

( strip method in Python strips the given character from beginning and end of the string.) ( Python 中的strip方法从字符串的开头和结尾剥离给定的字符。)

It is real simple.这真的很简单。 Just use:只需使用:

df.columns = ['Name1', 'Name2', 'Name3'...]

And it will assign the column names by the order you put them in.它将按照您输入的顺序分配列名。

If you already have a list for the new column names, you can try this:如果您已经有了新列名的列表,可以试试这个:

new_cols = ['a', 'b', 'c', 'd', 'e']
new_names_map = {df.columns[i]:new_cols[i] for i in range(len(new_cols))}

df.rename(new_names_map, axis=1, inplace=True)
# This way it will work
import pandas as pd

# Define a dictionary 
rankings = {'test': ['a'],
        'odi': ['E'],
        't20': ['P']}

# Convert the dictionary into DataFrame
rankings_pd = pd.DataFrame(rankings)

# Before renaming the columns
print(rankings_pd)

rankings_pd.rename(columns = {'test':'TEST'}, inplace = True)

你可以使用str.slice

df.columns = df.columns.str.slice(1)

Another option is to rename using a regular expression:另一种选择是使用正则表达式重命名:

import pandas as pd
import re

df = pd.DataFrame({'$a':[1,2], '$b':[3,4], '$c':[5,6]})

df = df.rename(columns=lambda x: re.sub('\$','',x))
>>> df
   a  b  c
0  1  3  5
1  2  4  6

My method is generic wherein you can add additional delimiters by comma separating delimiters= variable and future-proof it.我的方法是通用的,您可以通过逗号分隔delimiters= variable 添加其他分隔符并使其面向未来。

Working Code:工作代码:

import pandas as pd
import re


df = pd.DataFrame({'$a':[1,2], '$b': [3,4],'$c':[5,6], '$d': [7,8], '$e': [9,10]})

delimiters = '$'
matchPattern = '|'.join(map(re.escape, delimiters))
df.columns = [re.split(matchPattern, i)[1] for i in df.columns ]

Output:输出:

>>> df
   $a  $b  $c  $d  $e
0   1   3   5   7   9
1   2   4   6   8  10

>>> df
   a  b  c  d   e
0  1  3  5  7   9
1  2  4  6  8  10

Note that the approaches in previous answers do not work for a MultiIndex .请注意,先前答案中的方法不适用于MultiIndex For a MultiIndex , you need to do something like the following:对于MultiIndex ,您需要执行以下操作:

>>> df = pd.DataFrame({('$a','$x'):[1,2], ('$b','$y'): [3,4], ('e','f'):[5,6]})
>>> df
   $a $b  e
   $x $y  f
0  1  3  5
1  2  4  6
>>> rename = {('$a','$x'):('a','x'), ('$b','$y'):('b','y')}
>>> df.columns = pandas.MultiIndex.from_tuples([
        rename.get(item, item) for item in df.columns.tolist()])
>>> df
   a  b  e
   x  y  f
0  1  3  5
1  2  4  6

If you have to deal with loads of columns named by the providing system out of your control, I came up with the following approach that is a combination of a general approach and specific replacements in one go.如果您必须处理您无法控制的由提供系统命名的大量列,我想出了以下方法,它是一种通用方法和特定替换的组合。

First create a dictionary from the dataframe column names using regular expressions in order to throw away certain appendixes of column names and then add specific replacements to the dictionary to name core columns as expected later in the receiving database.首先使用正则表达式从数据框列名创建一个字典,以便丢弃列名的某些附录,然后将特定替换添加到字典中,以便稍后在接收数据库中按预期命名核心列。

This is then applied to the dataframe in one go.然后将其一次性应用于数据帧。

dict = dict(zip(df.columns, df.columns.str.replace('(:S$|:C1$|:L$|:D$|\.Serial:L$)', '')))
dict['brand_timeseries:C1'] = 'BTS'
dict['respid:L'] = 'RespID'
dict['country:C1'] = 'CountryID'
dict['pim1:D'] = 'pim_actual'
df.rename(columns=dict, inplace=True)

如果您只想删除“$”符号,请使用以下代码

df.columns = pd.Series(df.columns.str.replace("$", ""))

In addition to the solution already provided, you can replace all the columns while you are reading the file.除了已经提供的解决方案之外,您还可以在读取文件时替换所有列。 We can use names and header=0 to do that.我们可以使用namesheader=0来做到这一点。

First, we create a list of the names that we like to use as our column names:首先,我们创建一个我们喜欢用作列名的名称列表:

import pandas as pd

ufo_cols = ['city', 'color reported', 'shape reported', 'state', 'time']
ufo.columns = ufo_cols

ufo = pd.read_csv('link to the file you are using', names = ufo_cols, header = 0)

In this case, all the column names will be replaced with the names you have in your list.在这种情况下,所有列名都将替换为您在列表中的名称。

Here's a nifty little function I like to use to cut down on typing:这是我喜欢用来减少打字的一个漂亮的小功能:

def rename(data, oldnames, newname):
    if type(oldnames) == str: # Input can be a string or list of strings
        oldnames = [oldnames] # When renaming multiple columns
        newname = [newname] # Make sure you pass the corresponding list of new names
    i = 0
    for name in oldnames:
        oldvar = [c for c in data.columns if name in c]
        if len(oldvar) == 0:
            raise ValueError("Sorry, couldn't find that column in the dataset")
        if len(oldvar) > 1: # Doesn't have to be an exact match
            print("Found multiple columns that matched " + str(name) + ": ")
            for c in oldvar:
                print(str(oldvar.index(c)) + ": " + str(c))
            ind = input('Please enter the index of the column you would like to rename: ')
            oldvar = oldvar[int(ind)]
        if len(oldvar) == 1:
            oldvar = oldvar[0]
        data = data.rename(columns = {oldvar : newname[i]})
        i += 1
    return data

Here is an example of how it works:这是它如何工作的示例:

In [2]: df = pd.DataFrame(np.random.randint(0, 10, size=(10, 4)), columns = ['col1', 'col2', 'omg', 'idk'])
# First list = existing variables
# Second list = new names for those variables
In [3]: df = rename(df, ['col', 'omg'],['first', 'ohmy'])
Found multiple columns that matched col:
0: col1
1: col2

Please enter the index of the column you would like to rename: 0

In [4]: df.columns
Out[5]: Index(['first', 'col2', 'ohmy', 'idk'], dtype='object')

Assuming you can use a regular expression, this solution removes the need of manual encoding using a regular expression:假设您可以使用正则表达式,此解决方案无需使用正则表达式进行手动编码:

import pandas as pd
import re

srch = re.compile(r"\w+")

data = pd.read_csv("CSV_FILE.csv")
cols = data.columns
new_cols = list(map(lambda v:v.group(), (list(map(srch.search, cols)))))
data.columns = new_cols

I needed to rename features for XGBoost, and it didn't like any of these:我需要重命名 XGBoost 的功能,但它不喜欢以下任何一个:

import re
regex = r"[!\"#$%&'()*+,\-.\/:;<=>?@[\\\]^_`{|}~ ]+"
X_trn.columns = X_trn.columns.str.replace(regex, '_', regex=True)
X_tst.columns = X_tst.columns.str.replace(regex, '_', regex=True)

You can use lstrip or strip methods with index:您可以使用带有索引的lstripstrip方法:

df.columns = df.columns.str.lstrip('$')

or或者

cols = ['$a', '$b', '$c', '$d', '$e']
pd.Series(cols).str.lstrip('$').tolist()

Output:输出:

['a', 'b', 'c', 'd', 'e']

My one line answer is df.columns = df_new_cols is the best one with 1/3rd processing time.我的单行答案是df.columns = df_new_cols是处理时间为 1/3 的最佳答案。

timeit Comparison: df has 7 columns. timeit比较:df 有 7 列。 I am trying to change a few of the names.我正在尝试更改一些名称。

%timeit df.rename(columns={old_col:new_col for (old_col,new_col) in zip(df_old_cols,df_new_cols)},inplace=True)
214 µs ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df.rename(columns=dict(zip(df_old_cols,df_new_cols)),inplace=True)
212 µs ± 7.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df.columns = df_new_cols
72.9 µs ± 17.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM