简体   繁体   English

使用 pandas 在不同情况下对具有相同名称的列进行分组和求和

[英]group and sum columns that have the same name in different cases with pandas

I have a pandas dataframe with lots of columns that have what would be the same name, except the casing is not consistent.我有一个 pandas dataframe 有很多列名称相同,除了大小写不一致。 Some columns are in all caps and need to be summed with the appropriate column.有些列全部大写,需要与相应的列相加。 How can I combine all the these columns with the same name, keeping the appropriate casing for the column name in the end?如何将所有这些具有相同名称的列组合在一起,最后为列名保留适当的大小写?

I only want to change the case if there is another column name that is exactly the same string in all ways except for the casing.我只想更改大小写,如果有另一个列名在所有方面都是完全相同的字符串,除了大小写。 In this example I wouldn't want to change the case of "JFK", but I would want to combine the values of "CARL" with "Carl".在此示例中,我不想更改“JFK”的大小写,但我想将“CARL”的值与“Carl”结合起来。

edit: I realized my first example table didn't have the any cases where there was a name in all caps that did not have a matching name in a different case so I added "JFK".编辑:我意识到我的第一个示例表没有任何情况,其中所有大写的名称在不同的情况下没有匹配的名称,所以我添加了“JFK”。

Example df:示例 df:

Carl卡尔 CARL卡尔 Carl Smith卡尔·史密斯 David大卫 John约翰 JFK肯尼迪
1 1 3 3 7 7 4 4 2 2 9 9

Desired output:所需的 output:

Carl卡尔 Carl Smith卡尔·史密斯 David大卫 John约翰 JFK肯尼迪
4 4 7 7 4 4 2 2 9 9

You can groupby the columns:您可以按列分组:

import string
reduced = [string.capwords(x) for x in df]
df.groupby(reduced, axis=1).sum()

In one line:在一行中:

import string
df.groupby(df.columns.map(string.capwords), axis=1).sum()

Output: Output:

   Carl  Carl Smith  David  John
0     4           7      4     2

You could lower case the column names (or uppercase) and then group them together:您可以将列名(或大写)小写,然后将它们组合在一起:

# lower all col names for coherence
df.columns = [x.lower() for x in df.columns]
# group columns with same name
df.groupby(level=0, axis=1).sum()

Most of the other answers here don't take into account that there are some names in all-caps that I don't want to change the case of, such as "JFK".这里的大多数其他答案都没有考虑到我不想更改大小写的全大写名称,例如“JFK”。 This clarification was a later edit so it must've slipped through the cracks.这个澄清是后来的编辑,所以它一定是从裂缝中溜走了。

I only want to change the column name if it is a duplicate after changing all column names to lowercase eg "CARL" would become a duplicate with "Carl" after both were lowercased but "David", "John", and "JFK" would not be duplicates even after lowercasing.如果在将所有列名更改为小写后重复列名,我只想更改列名,例如“CARL”在两者都小写后将与“Carl”重复,但“David”、“John”和“JFK”会即使在小写之后也不会重复。

My Solution:我的解决方案:

cols = pd.Series(df.columns)                                                     # "Carl", "CARL", "David", "John", "JFK"
cols_lower = cols.str.lower()                                                    # "carl", "carl", "david", "john", "jfk"
cols_duplicates = cols_lower.drop(cols_lower.drop_duplicates().index)            # "carl"

names_to_group = df.loc[:, cols_lower.isin(cols_duplicates).to_numpy()].columns  # "Carl", "CARL"
new_names = names_to_group.map(string.capwords)                                  # "Carl", "Carl"

name_dict = dict(zip(names_to_group, new_names))                                 # {"Carl": "Carl", "CARL": "Carl"}  
df = df.rename(columns = name_dict)                                              
df = df.groupby(df.columns, axis=1).sum(1)
df

# Output:
   Carl  Carl Smith  David  John  JFK
0     4           7      4     2    9

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM