简体   繁体   中英

Python/Plotly: How to make each data point on Scatter plot represent median value?

Here is my dataset:

ob1=np.linspace(1, 10, 13).round(2).tolist()
ob2=np.linspace(10, 1, 12).round(2).tolist()
ob=ob1+ob2

ex_dic={'Vendor':['A','A','A','A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','B','B'],
       'Month':[1,1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,5,6,7,8,9,10,11,12],
       'Observation':ob
       }
ex_df=pd.DataFrame.from_dict(ex_dic)

It looks like this:

数据框

Here is code for my Plotly visualization:

ex_month_list=ex_df.Month.unique().tolist()
ex_vendor_list=ex_df.Vendor.unique().tolist()

fig=go.Figure()

for i in ex_vendor_list:
    by_vendor_df=ex_df.loc[ex_df['Vendor']==i]
    fig.add_trace(go.Scatter(x=by_vendor_df.Month, y=by_vendor_df.Observation, name=str(i),
                             mode='lines+markers', marker_line_width=2, marker_size=8))

It will show something like this: 散点图 Y-axis shows the observations (1-10), X-axis shows months (1-12)

Here is where the problem is:

在此处输入图像描述

I have tried applying median() here and there but cannot manage to make my plot represent median observations for each month... For example here is what I came up with so far (in terms of logic):

for i in vendor_list:
    vendor_df=some_df.loc[some_df['Vendor']==i]
    for m in month_list:
        month_df=vendor_df.loc[vendor_df['Month']==m]
        by_month_observations=month_df['Observation'].to_list()
        median_val=stat.median(by_month_observations)
        print(median_val)

Code above does return median values and it works all good, BUT now that some values went from 2 observations to 1 - I cannot append it back to dataframe since lengths are not the same anymore...Therefore, not sure if this is the best way to go with.

Please let me know by looking at the code above what is the smart way to go about this so that each datapoint that is printed is a median value for each month by vendor . Help is really appreciated!

Well, I figured myself the way to do it - simple use of .groupby() did the job!

Here is the the df I used trying to solve my problem:

some_dic={'Vendor':['A','A','A','A','B','B','B','B','B'],
       'Month':[6,7,8,8,6,7,8,8,8],
       'Observation':[1,2,3,4,10,8,6,3,1]
         }
some_df=pd.DataFrame.from_dict(some_dic)

Here is the code that generated successfully plot with median values:

...
grouped_df=vendor_df.groupby(vendor_df.Month)[['Observation']].median()
grouped_df.reset_index(inplace=True)
...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM