简体   繁体   English

使用 for 循环重命名熊猫数据框列

[英]Renaming pandas data frame columns using a for loop

I'm not sure if this is a dumb way to go about things, but I've got several data frames, all of which have identical columns.我不确定这是否是一种愚蠢的处理方式,但我有几个数据框,所有这些都有相同的列。 I need to rename the columns within each to reflect the names of each data frame (I'll be performing an outer merge of all of these afterwards).我需要重命名每个列中的列以反映每个数据框的名称(之后我将执行所有这些的外部合并)。

Let's say the data frames are called df1 , df2 and df3 , and each contains the columns name , date , and count .假设数据框称为df1df2df3 ,每个都包含列namedatecount

I'd like to rename each of the columns in df1 into name_df1 , date_df1 , and count_df1 .我想将df1中的每一列重命名为name_df1date_df1count_df1

I've written a function to rename the columns, thus:我编写了一个函数来重命名列,因此:

df_list=[df1, df2, df3]

def rename_cols():
    col_name="name"+suffix
    col_count="count"+suffix
    col_date="date"+suffix

for x in df_list:
    if x['name'].tail(1).item() == df1['name'].tail(1).item():
        suffix="_"+"df1"
        rename_cols()
        continue
    elif x['name'].tail(1).item() == df2['name'].tail(1).item():
        suffix="_"+"df2"
        rename_cols()
        continue
    else:
        suffix="_"+"df3"
        rename_cols()

    col_names=[col_name,col_date,col_count]
    x.columns=col_names

Unfortunately, I get the following error: KeyError: 'name'不幸的是,我收到以下错误: KeyError: 'name'

I'm really struggling to figure out why that's going on.我真的很难弄清楚为什么会这样。 The columns for df1, the first data frame in the df_list , gets renamed. df1 的列( df_list中的第一个数据框)被重命名。 Everything else stays the same... Am I messing up basic syntax (probably), or is there a fundamental misunderstanding that I've got of how things should work?其他一切都保持不变......我是在搞乱基本语法(可能),还是对我对事情应该如何工作有一个基本的误解?

From what I can ascertain, the first data frame in the list is being iterated through more than once — but why would that be the case?据我所知,列表中的第一个数据框被迭代了不止一次——但为什么会这样呢?

I guess you can achieve this with something simplier, like that :我想你可以用更简单的方法来实现这一点,比如:

df_list=[df1, df2, df3]
for i, df in enumerate(df_list, 1):
    df.columns = [col_name+'_df{}'.format(i) for col_name in df.columns]

If your DataFrames have prettier names you can try:如果您的 DataFrames 有更漂亮的名称,您可以尝试:

df_names=('Home', 'Work', 'Park')
for df_name in df_names:
    df = globals()[df_name]
    df.columns = [col_name+'_{}'.format(df_name) for col_name in df.columns]

Or you can fetch the name of each variable by looking up into globals() (or locals() ) :或者您可以通过查找globals() (或locals() )来获取每个变量的名称:

df_list = [Home, Work, Park]
for df in df_list:
    name = [k for k, v in globals().items() if id(v) == id(df) and k[0] != '_'][0]
    df.columns = [col_name+'_{}'.format(name) for col_name in df.columns]

My preferred rather simple way of doing this, especially when you want to apply some logic to all column names is:我更喜欢这样做的相当简单的方法,特别是当你想对所有列名应用一些逻辑时:

for col in df.columns:
    df.rename(columns={col:col.upper().replace(" ","_")},inplace=True)

I'll suppose that you have your stored in a dictionary as this is the idiomatic way of storing a series of named objects in Python.我假设您已将您的存储在字典中,因为这是在 Python 中存储一系列命名对象的惯用方式。 The idiomatic pandas way of changing your column names is to use a vectorised string operation on df.columns :更改列名的惯用 Pandas 方法是在df.columns上使用矢量化字符串操作:

df_dict = {"df1":df1, "df2":df2, "df3":df3}
for name, df in df_dict.items():
   df.columns = df.columns + "_" + name

Another option to consider is adding the suffixes automatically during the merge.要考虑的另一个选项是在合并期间自动添加后缀。 When you call merge you can specify the suffixes that will be appended to duplicate column names with the suffixes parameter.当您调用merge您可以使用suffixes参数指定将附加到重复列名称的suffixes If you just want to append the names of the dataframes, you can call it like this.如果您只想附加数据帧的名称,则可以这样调用。 :

from itertools import reduce
df_merged = reduce(lambda x,y: ("df_merged", 
                               x[1].merge(y[1], left_index=True, right_index=True, 
                                         suffixes = ("","_"+y[0]))),
                   df_dict.items())[1]

For completeness, since nobody has mentioned df.rename , see Andy Hayden's answer here:为了完整df.rename ,由于没有人提到df.rename ,请在此处查看 Andy Hayden 的回答:

Renaming columns in pandas 重命名熊猫中的列

df.rename can take a function as an argument, so in this case: df.rename可以将函数作为参数,所以在这种情况下:

df_dict = {'df1':df1,'df2':df2,'df3':df3}
for name,df in df_dict.items():
    df.rename(lambda x: x+'_'+name, inplace=True)

A more simple way更简单的方法

Get total length from cursor.description Then convert it into list Apply the list directly into DF从 cursor.description 获取总长度然后将其转换为列表将列表直接应用到 DF

num_fields = len(cursor.description)
field_names = [ i[0] for i in cursor.description ]
df.columns = field_names

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM