![](/img/trans.png)
[英]Python Pandas DataFrame User Defined Function Transformations
[英]User Defined Function transforming Python Pandas Dataframe Not Working
显然我不明白用户定义函数中的 return 语句是如何工作的。 当我从函数中删除语句时,代码有效。 我认为问题出在 return 语句上。
import pandas as pd
data = {"index_id": range(101, 131),
'company': ['Opera', 'Opera', 'Opera', 'Opera', 'Opera', 'Opera',
'Firefox', 'Firefox', 'Firefox', 'Firefox', 'Firefox', 'Firefox',
'Safari', 'Safari', 'Safari', 'Safari', 'Safari', 'Safari',
'Brave', 'Brave', 'Brave', 'Brave', 'Brave', 'Brave',
'Chrome', 'Chrome', 'Chrome', 'Chrome', 'Chrome', 'Chrome'],
"rating": [4, 5, 3, 3, 3, 3,
4, 5, 5, 1, 5, 5,
1, 4, 1, 2, 1, 2,
1, 5, 1, 5, 1, 5,
5, 5, 5, 4, 5, 4]
}
df = pd.DataFrame(data)
def AggRankBinRenameJoin (df_unaggdf):
#aggregating the unaggregated df
df_agg = df_unaggdf.groupby(['company']).agg({'rating':['std', 'mean']})
df_agg.columns = ['rating_std', 'rating_mean']
print(df_agg)
df_rank = df_agg.rank(ascending = 0, method = 'dense').add_prefix('rank_')
print(df_rank)
bin_labels = ['Diamond', 'Platinum', 'Gold', 'Silver', 'Bronze']
#bin_labels_reverse = ['Bronze', 'Silver', 'Gold', 'Platinum', 'Diamond']
df_bin= df_rank.apply(lambda x:pd.qcut(x, q=[0, .2, .4, .6, .8, 1], labels=bin_labels))
print(df_bin)
output = df_agg.join(df_rank).join(df_bin.add_prefix('bin_'))
print(output)
df_unaggdf = output.copy(deep = True)
return df_unaggdf
AggRankBinRenameJoin(df)
您需要指定要返回的数据框/和或变量。
如:
def AggRankBinRenameJoin (df_unaggdf):
#aggregating the unaggregated df
df_agg = df_unaggdf.groupby(['company']).agg({'rating':['std', 'mean']})
df_agg.columns = ['rating_std', 'rating_mean']
print(df_agg)
df_rank = df_agg.rank(ascending = 0, method = 'dense').add_prefix('rank_')
print(df_rank)
bin_labels = ['Diamond', 'Platinum', 'Gold', 'Silver', 'Bronze']
#bin_labels_reverse = ['Bronze', 'Silver', 'Gold', 'Platinum', 'Diamond']
df_bin= df_rank.apply(lambda x:pd.qcut(x, q=[0, .2, .4, .6, .8, 1], labels=bin_labels))
print(df_bin)
output = df_agg.join(df_rank).join(df_bin.add_prefix('bin_'))
print(output)
df_unaggdf = output.copy(deep = True)
return df_unaggdf
如果要使用函数指定转换原始 df
df=AggRankBinRenameJoin(df)
注意我没有检查任何其他潜在错误。 如果有的话,如果您编辑问题以显示错误消息会很有帮助。
弄清楚了。 需要将函数 AggRankBinRenameJoin(返回数据帧)设置为等于变量,因此该变量将等于返回的数据帧。
import pandas as pd
data = {"index_id": range(101, 131),
'company': ['Opera', 'Opera', 'Opera', 'Opera', 'Opera', 'Opera',
'Firefox', 'Firefox', 'Firefox', 'Firefox', 'Firefox', 'Firefox',
'Safari', 'Safari', 'Safari', 'Safari', 'Safari', 'Safari',
'Brave', 'Brave', 'Brave', 'Brave', 'Brave', 'Brave',
'Chrome', 'Chrome', 'Chrome', 'Chrome', 'Chrome', 'Chrome'],
"rating": [4, 5, 3, 3, 3, 3,
4, 5, 5, 1, 5, 5,
1, 4, 1, 2, 1, 2,
1, 5, 1, 5, 1, 5,
5, 5, 5, 4, 5, 4]
}
df = pd.DataFrame(data)
def AggRankBinRenameJoin (df_unaggdf):
#aggregating the unaggregated df
df_agg = df_unaggdf.groupby(['company']).agg({'rating':['std', 'mean']})
df_agg.columns = ['rating_std', 'rating_mean']
print(df_agg)
df_rank = df_agg.rank(ascending = 0, method = 'dense').add_prefix('rank_')
print(df_rank)
bin_labels = ['Diamond', 'Platinum', 'Gold', 'Silver', 'Bronze']
#bin_labels_reverse = ['Bronze', 'Silver', 'Gold', 'Platinum', 'Diamond']
df_bin= df_rank.apply(lambda x:pd.qcut(x, q=[0, .2, .4, .6, .8, 1], labels=bin_labels))
print(df_bin)
output = df_agg.join(df_rank).join(df_bin.add_prefix('bin_'))
print(output)
df_unaggdf = output.copy(deep = True)
return df_unaggdf
df1 = AggRankBinRenameJoin(df)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.