[英]Python/Plotly: How to make each data point on Scatter plot represent median value?
Here is my dataset:这是我的数据集:
ob1=np.linspace(1, 10, 13).round(2).tolist()
ob2=np.linspace(10, 1, 12).round(2).tolist()
ob=ob1+ob2
ex_dic={'Vendor':['A','A','A','A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','B','B'],
'Month':[1,1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,5,6,7,8,9,10,11,12],
'Observation':ob
}
ex_df=pd.DataFrame.from_dict(ex_dic)
It looks like this:它看起来像这样:
Here is code for my Plotly visualization:这是我的 Plotly 可视化的代码:
ex_month_list=ex_df.Month.unique().tolist()
ex_vendor_list=ex_df.Vendor.unique().tolist()
fig=go.Figure()
for i in ex_vendor_list:
by_vendor_df=ex_df.loc[ex_df['Vendor']==i]
fig.add_trace(go.Scatter(x=by_vendor_df.Month, y=by_vendor_df.Observation, name=str(i),
mode='lines+markers', marker_line_width=2, marker_size=8))
It will show something like this:它将显示如下内容: Y-axis shows the observations (1-10), X-axis shows months (1-12) Y 轴显示观察值 (1-10),X 轴显示月份 (1-12)
Here is where the problem is:这是问题所在:
I have tried applying median() here and there but cannot manage to make my plot represent median observations for each month... For example here is what I came up with so far (in terms of logic):我已经尝试在这里和那里应用中位数(),但无法让我的 plot 代表每个月的中位数观察值......例如,这是我到目前为止提出的(在逻辑方面):
for i in vendor_list:
vendor_df=some_df.loc[some_df['Vendor']==i]
for m in month_list:
month_df=vendor_df.loc[vendor_df['Month']==m]
by_month_observations=month_df['Observation'].to_list()
median_val=stat.median(by_month_observations)
print(median_val)
Code above does return median values and it works all good, BUT now that some values went from 2 observations to 1 - I cannot append it back to dataframe since lengths are not the same anymore...Therefore, not sure if this is the best way to go with.上面的代码确实返回了中值,并且效果很好,但是现在某些值从 2 个观察值变为 1 个 - 我不能 append 它回到 Z6A8064B5DF4794555500553C47C55057DZ 因为长度不再相同......因此,不确定这是否是最好的go 的方式。
Please let me know by looking at the code above what is the smart way to go about this so that each datapoint that is printed is a median value for each month by vendor .请通过查看上面的代码让我知道 go 的智能方法是什么,以便打印的每个数据点都是供应商每个月的中值。 Help is really appreciated!非常感谢您的帮助!
Well, I figured myself the way to do it - simple use of .groupby()
did the job!好吧,我自己想出了办法——简单地使用.groupby()
就可以了!
Here is the the df I used trying to solve my problem:这是我用来解决问题的df:
some_dic={'Vendor':['A','A','A','A','B','B','B','B','B'],
'Month':[6,7,8,8,6,7,8,8,8],
'Observation':[1,2,3,4,10,8,6,3,1]
}
some_df=pd.DataFrame.from_dict(some_dic)
Here is the code that generated successfully plot with median values:这是成功生成 plot 的代码,其中值为:
...
grouped_df=vendor_df.groupby(vendor_df.Month)[['Observation']].median()
grouped_df.reset_index(inplace=True)
...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.