[英]pandas add a value to new column to each row in a group
I have a pandas dataframe with several columns.我有一个包含多个列的熊猫数据框。 for examlpe:例如:
# name abbr country
0 454 Liverpool UCL England
1 454 Bayern Munich UCL Germany
2 223 Manchester United UEL England
3 454 Manchester City UCL England
and I run a function using.gropuby() - but then I want to add to each row of that group the value I calculated once.然后我运行一个函数 using.gropuby() - 但随后我想将我计算一次的值添加到该组的每一行。
The example code is here:示例代码在这里:
def test_func(abbreviation):
if abbreviation == 'UCL':
return 'UEFA Champions League'
elif abbreviation == 'UEL':
return 'UEFA Europe Leauge'
data = [[454, 'Liverpool', 'UCL', 'England'], [454, 'Bayern Munich', 'UCL', 'Germany'], [223, 'Manchester United', 'UEL', 'England'], [454, 'Manchester City', 'UCL', 'England']]
df = pd.DataFrame(data, columns=['#','name','abbr', 'country'])
competition_df = df.groupby('#').first()
competition_df['competition'] = competition_df.apply(lambda row: test_func(row["abbr"]), axis=1)
and now I would like to add the value of "competition" to all the cases based on group in the original dataframe (df).现在我想将“竞争”的值添加到原始数据框 (df) 中基于组的所有案例。
Is there a good way (using 'native' pandas) to do it without iterations and lists etc.?有没有一种不用迭代和列表等的好方法(使用“本地”熊猫)来做到这一点?
Edit 1:编辑 1:
The final output would then be the original dataframe (df) with the new column:最终输出将是带有新列的原始数据框 (df):
# name abbr country competition
0 454 Liverpool UCL England UEFA Champions League
1 454 Bayern Munich UCL Germany UEFA Champions League
2 223 Manchester United UEL England UEFA Europe Leauge
3 454 Manchester City UCL England UEFA Champions League
Edit 2:编辑 2:
I managed to get what I want by zipping, but its a very bad implementation and I am still wondering if I could do it better (and faster using some pandas functions):我设法通过压缩得到了我想要的东西,但这是一个非常糟糕的实现,我仍然想知道我是否可以做得更好(并且使用一些 pandas 函数更快):
zipped = zip(competition_df.index, competition_df['competition'])
df['competition'] = np.nan
for num, comp in zipped:
df.loc[df['#']==num, 'competition'] = comp
I think these might be helpful.我认为这些可能会有所帮助。
import pandas
data = [[454, 'Liverpool', 'UCL', 'England'], [454, 'Bayern Munich', 'UCL', 'Germany'], [223, 'Manchester United', 'UEL', 'England'], [454, 'Manchester City', 'UCL', 'England']]
df = pandas.DataFrame(data, columns=['#','name','abbr', 'country'])
# option 1
abbreviation_dict = {
'UCL': 'UEFA Champions League',
'UEL': 'UEFA Europe Leauge'
}
df['competition'] = df['abbr'].replace(abbreviation_dict)
# option 2 using a function
def get_dict_for_replace(unique_values):
some_dict = {}
for unique_value in unique_values:
if unique_value == 'UCL':
value_1 = 'UEFA Champions League' # or whatever is complicated
some_dict.update({'UCL': value_1})
elif unique_value == 'UEL':
value_2 = 'UEFA Europe Leauge' # or whatever is complicated
some_dict.update({'UEL': value_2})
return some_dict
# get your unique values,
unique_values = df['abbr'].unique()
# get your dictionary
abbreviation_dict = get_dict_for_replace(unique_values)
df['competition'] = df['abbr'].replace(abbreviation_dict)
Without knowing your exact problem then this is probably the most general if you want to use a function.在不知道你的确切问题的情况下,如果你想使用一个函数,这可能是最普遍的。 Run each calculation once.每个计算运行一次。 Pass to the dataframe.传递给数据框。 You can probably pack your dictionary more efficiently based on your actual requirements.您可以根据您的实际需求更有效地打包您的字典。
aside: Using groupby on '#' instead of 'abbr' might have unwanted consequences unless the mapping is 1-to-1.旁白:除非映射是一对一的,否则在“#”而不是“abbr”上使用 groupby 可能会产生不良后果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.