如何平滑和绘制x加权平均值y，加权x？

Question

I have a dataframe with a column of weights and one of values. 我有一个数据框，其中包含一列权重和一个值。 I'd need: 我需要：

to discretise weights and, for each interval of weights, plot the weighted average of values , then 对权重进行离散，并且对于每个权重间隔，绘制值的加权平均值
to extend the same logic to another variable: discretise z, and for each interval, plot the weighted average of values, weighted by weights 将相同的逻辑扩展到另一个变量：离散z，并且对于每个区间，绘制加权平均值，按权重加权

Is there an easy way to achieve this?I have found a way, but it seems a bit cumbersome: 有没有一种简单的方法来实现这一目标？我找到了一种方法，但似乎有点麻烦：

I discretise the dataframe with pandas.cut() 我用pandas.cut（）对数据帧进行了离散化
do a groupby and calculate the weighted average 做一个groupby并计算加权平均值
plot the mean of each bin vs the weighted average 绘制每个仓的平均值与加权平均值的关系
I have also tried to smooth the curve with a spline, but it doesn't do much 我也尝试用样条曲线平滑曲线，但它没有做太多

Basically I'm looking for a better way to produce a more smoothed curve. 基本上我正在寻找一种更好的方法来产生更平滑的曲线。

My output looks like this: 我的输出如下：

and my code, with some random data, is: 我的代码和一些随机数据是：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.interpolate import make_interp_spline, BSpline

n=int(1e3)
df=pd.DataFrame()
np.random.seed(10)
df['w']=np.arange(0,n)
df['v']=np.random.randn(n)
df['ranges']=pd.cut(df.w, bins=50)
df['one']=1.
def func(x, df):
    # func() gets called within a lambda function; x is the row, df is the entire table
    b1= x['one'].sum()
    b2 = x['w'].mean()
    b3 = x['v'].mean()       
    b4=( x['w'] * x['v']).sum() / x['w'].sum() if x['w'].sum() >0 else np.nan

    cols=['# items','avg w','avg v','weighted avg v']
    return pd.Series( [b1, b2, b3, b4], index=cols )

summary = df.groupby('ranges').apply(lambda x: func(x,df))

sns.set(style='darkgrid')

fig,ax=plt.subplots(2)
sns.lineplot(summary['avg w'], summary['weighted avg v'], ax=ax[0])
ax[0].set_title('line plot')

xnew = np.linspace(summary['avg w'].min(), summary['avg w'].max(),100)
spl = make_interp_spline(summary['avg w'], summary['weighted avg v'], k=5) #BSpline object
power_smooth = spl(xnew)
sns.lineplot(xnew, power_smooth, ax=ax[1])
ax[1].set_title('not-so-interpolated plot')

Answer 1

The first part of your question is rather easy to do. 问题的第一部分很容易做到。

I'm not sure what you mean with the second part. 我不确定你对第二部分的意思。 Do you want a (simplified) reproduction of your code or a new approach that better fits your need? 您想要（简化）复制代码或更符合您需求的新方法吗？

Anyway i had to look at your code to understand what you mean by weighting the values. 无论如何，我必须查看你的代码，通过加权值来理解你的意思。 I think people would normally expect something different from the term (just as a warning). 我认为人们通常会期待与术语不同的东西（就像警告一样）。

Here's the simplified version of your approach: 这是您的方法的简化版本：

df['prod_v_w'] = df['v']*df['w']
weighted_avg_v = df.groupby(pd.cut(df.w, bins=50))[['prod_v_w','w']].sum()\
                   .eval('prod_v_w/w')
print(np.allclose(weighted_avg_v, summary['weighted avg v']))
Out[18]: True

Answer 2

I think you're using few values for the interpolation, by changing xnew = np.linspace(summary['avg w'].min(), summary['avg w'].max(),100) to xnew = np.linspace(summary['avg w'].min(), summary['avg w'].max(),500) I get the following: 我认为通过将xnew = np.linspace(summary['avg w'].min(), summary['avg w'].max(),100)更改为xnew = np.linspace(summary['avg w'].min(), summary['avg w'].max(),500) xnew = np.linspace(summary['avg w'].min(), summary['avg w'].max(),100)您使用的插值值很少xnew = np.linspace(summary['avg w'].min(), summary['avg w'].max(),100) xnew = np.linspace(summary['avg w'].min(), summary['avg w'].max(),500)我得到以下内容：

And changint the spline degree to k=2 i get the following: 并且将样条度变为k=2我得到以下结果：

I think a good starting point for the interpolation could be n/2 and k=2 as it presents less data deformation. 我认为插值的一个很好的起点可能是n/2和k=2因为它表现出较少的数据变形。 Hope it helps. 希望能帮助到你。

Answer 3

If I'm understanding correctly, you're trying to recreate a rolling average. 如果我理解正确，那么你正试图重新创建滚动平均值。

This is already a capability of Pandas dataframes, using the rolling function: 这已经是Pandas数据帧的功能，使用rolling功能：

dataframe.rolling(n).mean()

where n is the number of adjacent points used in the 'window' or 'bin' for the average, so you can tweak it for different degrees of smoothness. 其中n是“窗口”或“bin”中用于平均值的相邻点的数量，因此您可以调整它以获得不同的平滑度。

You can find examples here: 你可以在这里找到例子：

Answer 4

I think this is a solution to what you are seeking. 我认为这是你所寻求的解决方案。 It uses rolling window as others have suggested. 它像其他人建议的那样使用滚动窗口。 a little bit more work was needed to get it working properly. 为了让它正常工作，需要更多的工作。

df["w*v"] = df["w"] * df["v"]

def rolling_smooth(df,N):
    df_roll = df.rolling(N).agg({"w":["sum","mean"],"v":["mean"],"w*v":["sum"]})
    df_roll.columns = [' '.join(col).strip() for col in df_roll.columns.values]
    df_roll['weighted avg v'] = np.nan
    cond = df_roll['w sum'] > 0
    df_roll.loc[cond,'weighted avg v'] = df_roll.loc[cond,'w*v sum'] / df_roll.loc[cond,'w sum']
    return df_roll

df_roll_100 = rolling_smooth(df,100)
df_roll_200 = rolling_smooth(df,200)

plt.plot(summary['avg w'], summary['weighted avg v'],label='original')
plt.plot(df_roll_100["w mean"],df_roll_100["weighted avg v"],label='rolling N=100')
plt.plot(df_roll_200["w mean"],df_roll_200["weighted avg v"],label='rolling N=200')
plt.legend()

如何平滑和绘制x加权平均值y，加权x？

问题描述

4 个解决方案

解决方案1
1 2019-04-05 17:19:26

解决方案2
1 2019-04-11 19:24:12

解决方案3
0 2019-04-05 14:37:44

解决方案4
0 2019-04-11 07:17:07

如何平滑和绘制x加权平均值y，加权x？

问题描述

4 个解决方案

解决方案1 1 2019-04-05 17:19:26

解决方案2 1 2019-04-11 19:24:12

解决方案3 0 2019-04-05 14:37:44

解决方案4 0 2019-04-11 07:17:07

解决方案1
1 2019-04-05 17:19:26

解决方案2
1 2019-04-11 19:24:12

解决方案3
0 2019-04-05 14:37:44

解决方案4
0 2019-04-11 07:17:07