简体   繁体   English

熊猫向组中每一行的新列添加一个值

[英]pandas add a value to new column to each row in a group

I have a pandas dataframe with several columns.我有一个包含多个列的熊猫数据框。 for examlpe:例如:

     #  name        abbr    country
0   454 Liverpool   UCL England
1   454 Bayern Munich   UCL Germany
2   223 Manchester United   UEL England
3   454 Manchester City UCL England

and I run a function using.gropuby() - but then I want to add to each row of that group the value I calculated once.然后我运行一个函数 using.gropuby() - 但随后我想将我计算一次的值添加到该组的每一行。

The example code is here:示例代码在这里:

def test_func(abbreviation):
  if abbreviation == 'UCL':
    return 'UEFA Champions League'
  elif abbreviation == 'UEL':
    return 'UEFA Europe Leauge'

data = [[454, 'Liverpool', 'UCL', 'England'], [454, 'Bayern Munich', 'UCL', 'Germany'], [223, 'Manchester United', 'UEL', 'England'], [454, 'Manchester City', 'UCL', 'England']]
df = pd.DataFrame(data, columns=['#','name','abbr', 'country'])
competition_df = df.groupby('#').first()
competition_df['competition'] = competition_df.apply(lambda row: test_func(row["abbr"]), axis=1)

and now I would like to add the value of "competition" to all the cases based on group in the original dataframe (df).现在我想将“竞争”的值添加到原始数据框 (df) 中基于组的所有案例。

Is there a good way (using 'native' pandas) to do it without iterations and lists etc.?有没有一种不用迭代和列表等的好方法(使用“本地”熊猫)来做到这一点?


Edit 1:编辑 1:

The final output would then be the original dataframe (df) with the new column:最终输出将是带有新列的原始数据框 (df):

    #   name    abbr    country competition
0   454 Liverpool   UCL England UEFA Champions League
1   454 Bayern Munich   UCL Germany UEFA Champions League
2   223 Manchester United   UEL England UEFA Europe Leauge
3   454 Manchester City UCL England UEFA Champions League

Edit 2:编辑 2:

I managed to get what I want by zipping, but its a very bad implementation and I am still wondering if I could do it better (and faster using some pandas functions):我设法通过压缩得到了我想要的东西,但这是一个非常糟糕的实现,我仍然想知道我是否可以做得更好(并且使用一些 pandas 函数更快):

zipped = zip(competition_df.index, competition_df['competition'])
df['competition'] = np.nan
for num, comp in zipped:
  df.loc[df['#']==num, 'competition'] = comp

I think these might be helpful.我认为这些可能会有所帮助。

import pandas

data = [[454, 'Liverpool', 'UCL', 'England'], [454, 'Bayern Munich', 'UCL', 'Germany'], [223, 'Manchester United', 'UEL', 'England'], [454, 'Manchester City', 'UCL', 'England']]
df = pandas.DataFrame(data, columns=['#','name','abbr', 'country'])

# option 1
abbreviation_dict = {
    'UCL': 'UEFA Champions League',
    'UEL': 'UEFA Europe Leauge'
}

df['competition'] = df['abbr'].replace(abbreviation_dict)
# option 2 using a function
def get_dict_for_replace(unique_values):
    some_dict = {}
    for unique_value in unique_values:
        if unique_value == 'UCL':
            value_1 = 'UEFA Champions League'  # or whatever is complicated
            some_dict.update({'UCL': value_1})

        elif unique_value  == 'UEL':
            value_2 = 'UEFA Europe Leauge'  # or whatever is complicated
            some_dict.update({'UEL': value_2})

    return some_dict

# get your unique values,
unique_values = df['abbr'].unique() 
# get your dictionary
abbreviation_dict = get_dict_for_replace(unique_values)

df['competition'] = df['abbr'].replace(abbreviation_dict)

Without knowing your exact problem then this is probably the most general if you want to use a function.在不知道你的确切问题的情况下,如果你想使用一个函数,这可能是最普遍的。 Run each calculation once.每个计算运行一次。 Pass to the dataframe.传递给数据框。 You can probably pack your dictionary more efficiently based on your actual requirements.您可以根据您的实际需求更有效地打包您的字典。

aside: Using groupby on '#' instead of 'abbr' might have unwanted consequences unless the mapping is 1-to-1.旁白:除非映射是一对一的,否则在“#”而不是“abbr”上使用 groupby 可能会产生不良后果。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas 每组新增一行 - Pandas add one new row to each group 如何在PANDAS中的每组groupby中添加新行,该行的值之一是每组值的总和 - how to add new row into each group of groupby in PANDAS , one of the value of that row is sum of values of each groups Pandas groupby每列,并为每个组添加新列 - Pandas groupby each column and add new column for each group 熊猫:将每一行转换为 <column name,row value> dict并添加为新列 - Pandas: convert each row to a <column name,row value> dict and add as a new column 有没有办法向pandas数据框添加新列,将新列的每个唯一值附加到数据帧的每个现有行? - Is there a way to add a new column to a pandas dataframe, appending each unique value of the new column to every existing row of the dataframe? 获取每个组中的特定值并将其添加为每个组中的新列 - get specific value in each group and add it as new column in each group 添加新列,每行作为另一列的前一个组值 - Add new column with each row as previous group value from another column pandas-列中每个唯一字符串/组的新计算行 - pandas- new calculated row for each unique string/group in a column 如何使用熊猫将最大值添加到每个组成员的新列中 - How can I add max value to a new column to each group member using pandas 将同一行从 pandas dataframe 多次添加到新行,每次更改特定列中的值 - Add the same row multiple times from a pandas dataframe to a new one, each time altering a value in a specific column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM