Python/Plotly: How to make each data point on Scatter plot represent median value?

Question

Here is my dataset:

ob1=np.linspace(1, 10, 13).round(2).tolist()
ob2=np.linspace(10, 1, 12).round(2).tolist()
ob=ob1+ob2

ex_dic={'Vendor':['A','A','A','A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','B','B'],
       'Month':[1,1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,5,6,7,8,9,10,11,12],
       'Observation':ob
       }
ex_df=pd.DataFrame.from_dict(ex_dic)

It looks like this:

Here is code for my Plotly visualization:

ex_month_list=ex_df.Month.unique().tolist()
ex_vendor_list=ex_df.Vendor.unique().tolist()

fig=go.Figure()

for i in ex_vendor_list:
    by_vendor_df=ex_df.loc[ex_df['Vendor']==i]
    fig.add_trace(go.Scatter(x=by_vendor_df.Month, y=by_vendor_df.Observation, name=str(i),
                             mode='lines+markers', marker_line_width=2, marker_size=8))

It will show something like this: Y-axis shows the observations (1-10), X-axis shows months (1-12)

Here is where the problem is:

I have tried applying median() here and there but cannot manage to make my plot represent median observations for each month... For example here is what I came up with so far (in terms of logic):

for i in vendor_list:
    vendor_df=some_df.loc[some_df['Vendor']==i]
    for m in month_list:
        month_df=vendor_df.loc[vendor_df['Month']==m]
        by_month_observations=month_df['Observation'].to_list()
        median_val=stat.median(by_month_observations)
        print(median_val)

Code above does return median values and it works all good, BUT now that some values went from 2 observations to 1 - I cannot append it back to dataframe since lengths are not the same anymore...Therefore, not sure if this is the best way to go with.

Please let me know by looking at the code above what is the smart way to go about this so that each datapoint that is printed is a median value for each month by vendor . Help is really appreciated!

Answer 1

Well, I figured myself the way to do it - simple use of .groupby() did the job!

Here is the the df I used trying to solve my problem:

some_dic={'Vendor':['A','A','A','A','B','B','B','B','B'],
       'Month':[6,7,8,8,6,7,8,8,8],
       'Observation':[1,2,3,4,10,8,6,3,1]
         }
some_df=pd.DataFrame.from_dict(some_dic)

Here is the code that generated successfully plot with median values:

...
grouped_df=vendor_df.groupby(vendor_df.Month)[['Observation']].median()
grouped_df.reset_index(inplace=True)
...

Python/Plotly: How to make each data point on Scatter plot represent median value?

Question

1 answers

solution1
1 ACCPTED 2019-11-12 19:15:56

Python/Plotly: How to make each data point on Scatter plot represent median value?

Question

1 answers

solution1 1 ACCPTED 2019-11-12 19:15:56

solution1
1 ACCPTED 2019-11-12 19:15:56