Pandas Series.rename未反映在DataFrame列中

Question

I'm trying to rename a column by validating the values in the particular columns. 我试图重新命名column由特定列验证值。 Here is the set-up: 这是设置：

In [9]: import pandas as pd

In [10]: df = pd.DataFrame(
    ...:         {"unknown_field": ['bob@gmail.com', 'shirley@gmail.com', 'groza@pubg.com']}
    ...:     )

In [11]: df
Out[11]: 
       unknown_field
0      bob@gmail.com
1  shirley@gmail.com
2     groza@pubg.com

Using a validate_column(ser) , which takes Pandas.Series object as parameter, it validates the values in that column and modifies the column name of that particular column with a pre-defined set of column names. 使用以Pandas.Series对象为参数的validate_column(ser) ，可以验证该列中的值，并使用一组预定义的列名来修改该特定列的列名。 To make it simple, in this example, the column is validated as an email column. 为简单起见，在此示例中，该列被验证为email列。

In [12]: def validate_column(ser):
    ...:     # Value validation method returns that this column is email column
    ...:     ser.rename('email', inplace=True)
    ...:

The current name of the unknown_field is: unknown_field , and as expected, the name changes to email after executing validate_column method: unknown_field的当前名称为： unknown_field ，并且按预期的那样，在执行validate_column方法后，该名称将更改为email ：

In [13]: df.unknown_field
Out[13]: 
0        bob@gmail.com
1    shirley@gmail.com
2       groza@pubg.com
Name: unknown_field, dtype: object

In [14]: validate_column(df.unknown_field)

In [15]: df.unknown_field
Out[15]: 
0        bob@gmail.com
1    shirley@gmail.com
2       groza@pubg.com
Name: email, dtype: object

However, the column names within the df is not modified as I expected. 但是， df的列名称未按我的预期进行修改。 It still named as unknown_field within the df variable: 仍在df变量中将其命名为unknown_field ：

In [16]: df
Out[16]: 
       unknown_field
0      bob@gmail.com
1  shirley@gmail.com
2     groza@pubg.com

Currently, I use the following code to manually modify the column name within my df variable. 当前，我使用以下代码手动修改df变量中的列名称。

In [17]: for col in df.select_dtypes(object):
    ...:     df.rename(columns={col: df[col].name}, inplace=True)
    ...:     

In [18]: df
Out[18]: 
               email
0      bob@gmail.com
1  shirley@gmail.com
2     groza@pubg.com

In [19]:

My question is: 我的问题是：

Is there a more efficient/straightforward method for renaming the Series and directly reflected in the DataFrame ? 是否有更有效/直接的方法来重命名Series并直接反映在DataFrame ？

Answer 1

Re-write your function to accept two parameters: 重新编写函数以接受两个参数：

def validate_column(df, col_name):
    # Value validation method returns that this column is email column
    return df.rename({col_name : 'email'}, axis=1)

Now, call your function through DataFrame.pipe : 现在，通过DataFrame.pipe调用函数：

df.pipe(validate_column, col_name='unknown_field')

               email
0      bob@gmail.com
1  shirley@gmail.com
2     groza@pubg.com

Very clean. 很干净。 This is useful if you want to chain validations: 如果要链接验证，这将很有用：

(df.pipe(validate_column, col_name='unknown_field')
   .pipe(validate_column, col_name='some_other_field')
   .pipe(validate_column, col_name='third_field')
)

... or modify validate_column to validate multiple columns at a time. ...或修改validate_column以一次验证多个列。

Note that the renaming is no longer done in-place, and whatever result is returned from pipe needs to be assigned back. 请注意，重命名不再就地完成，并且从pipe返回的任何结果都需要分配回去。

Answer 2

Use dataframe's rename function and set columns argument. 使用数据框的重命名功能并设置列参数。

import pandas as pd
df = pd.DataFrame({"unknown_field": ['bob@gmail.com', 'shirley@gmail.com', 'groza@pubg.com']})
df = df.rename(columns={'unknown_field': 'email'})

Output: 输出：

    email
0   bob@gmail.com
1   shirley@gmail.com
2   groza@pubg.com

Pandas Series.rename未反映在DataFrame列中

问题描述

My question is: 我的问题是：

2 个解决方案

解决方案1
2 已采纳 2018-06-26 06:46:15

解决方案2
0 2018-06-26 07:07:08

Pandas Series.rename未反映在DataFrame列中

问题描述

My question is: 我的问题是：

2 个解决方案

解决方案1 2 已采纳 2018-06-26 06:46:15

解决方案2 0 2018-06-26 07:07:08

解决方案1
2 已采纳 2018-06-26 06:46:15

解决方案2
0 2018-06-26 07:07:08