熊猫向组中每一行的新列添加一个值

Question

I have a pandas dataframe with several columns.我有一个包含多个列的熊猫数据框。 for examlpe:例如：

     #  name        abbr    country
0   454 Liverpool   UCL England
1   454 Bayern Munich   UCL Germany
2   223 Manchester United   UEL England
3   454 Manchester City UCL England

and I run a function using.gropuby() - but then I want to add to each row of that group the value I calculated once.然后我运行一个函数 using.gropuby() - 但随后我想将我计算一次的值添加到该组的每一行。

The example code is here:示例代码在这里：

def test_func(abbreviation):
  if abbreviation == 'UCL':
    return 'UEFA Champions League'
  elif abbreviation == 'UEL':
    return 'UEFA Europe Leauge'

data = [[454, 'Liverpool', 'UCL', 'England'], [454, 'Bayern Munich', 'UCL', 'Germany'], [223, 'Manchester United', 'UEL', 'England'], [454, 'Manchester City', 'UCL', 'England']]
df = pd.DataFrame(data, columns=['#','name','abbr', 'country'])
competition_df = df.groupby('#').first()
competition_df['competition'] = competition_df.apply(lambda row: test_func(row["abbr"]), axis=1)

and now I would like to add the value of "competition" to all the cases based on group in the original dataframe (df).现在我想将“竞争”的值添加到原始数据框 (df) 中基于组的所有案例。

Is there a good way (using 'native' pandas) to do it without iterations and lists etc.?有没有一种不用迭代和列表等的好方法（使用“本地”熊猫）来做到这一点？

Edit 1:编辑 1：

The final output would then be the original dataframe (df) with the new column:最终输出将是带有新列的原始数据框 (df)：

    #   name    abbr    country competition
0   454 Liverpool   UCL England UEFA Champions League
1   454 Bayern Munich   UCL Germany UEFA Champions League
2   223 Manchester United   UEL England UEFA Europe Leauge
3   454 Manchester City UCL England UEFA Champions League

Edit 2:编辑 2：

I managed to get what I want by zipping, but its a very bad implementation and I am still wondering if I could do it better (and faster using some pandas functions):我设法通过压缩得到了我想要的东西，但这是一个非常糟糕的实现，我仍然想知道我是否可以做得更好（并且使用一些 pandas 函数更快）：

zipped = zip(competition_df.index, competition_df['competition'])
df['competition'] = np.nan
for num, comp in zipped:
  df.loc[df['#']==num, 'competition'] = comp

Answer 1

I think these might be helpful.我认为这些可能会有所帮助。

import pandas

data = [[454, 'Liverpool', 'UCL', 'England'], [454, 'Bayern Munich', 'UCL', 'Germany'], [223, 'Manchester United', 'UEL', 'England'], [454, 'Manchester City', 'UCL', 'England']]
df = pandas.DataFrame(data, columns=['#','name','abbr', 'country'])

# option 1
abbreviation_dict = {
    'UCL': 'UEFA Champions League',
    'UEL': 'UEFA Europe Leauge'
}

df['competition'] = df['abbr'].replace(abbreviation_dict)

# option 2 using a function
def get_dict_for_replace(unique_values):
    some_dict = {}
    for unique_value in unique_values:
        if unique_value == 'UCL':
            value_1 = 'UEFA Champions League'  # or whatever is complicated
            some_dict.update({'UCL': value_1})

        elif unique_value  == 'UEL':
            value_2 = 'UEFA Europe Leauge'  # or whatever is complicated
            some_dict.update({'UEL': value_2})

    return some_dict

# get your unique values,
unique_values = df['abbr'].unique() 
# get your dictionary
abbreviation_dict = get_dict_for_replace(unique_values)

df['competition'] = df['abbr'].replace(abbreviation_dict)

Without knowing your exact problem then this is probably the most general if you want to use a function.在不知道你的确切问题的情况下，如果你想使用一个函数，这可能是最普遍的。 Run each calculation once.每个计算运行一次。 Pass to the dataframe.传递给数据框。 You can probably pack your dictionary more efficiently based on your actual requirements.您可以根据您的实际需求更有效地打包您的字典。

aside: Using groupby on '#' instead of 'abbr' might have unwanted consequences unless the mapping is 1-to-1.旁白：除非映射是一对一的，否则在“#”而不是“abbr”上使用 groupby 可能会产生不良后果。

熊猫向组中每一行的新列添加一个值

问题描述

1 个解决方案

解决方案1
1 2022-12-19 13:06:41

熊猫向组中每一行的新列添加一个值

问题描述

1 个解决方案

解决方案1 1 2022-12-19 13:06:41

解决方案1
1 2022-12-19 13:06:41