简体   繁体   English

如何从数据框中绘制多条线

[英]How to plot multiple lines from a dataframe

I have the following data:我有以下数据:

import pandas as pd

# using the data dict at the bottom of the question
df_uplift_percentile = pd.DataFrame.from_dict(data, 'index')
df_uplift_percentile.index.name = 'percentile'

# display(df_uplift_percentile)
            n_treatment  n_control  response_rate_treatment  response_rate_control    uplift  std_treatment  std_control  std_uplift
percentile                                                                                                                          
0-10                217        983                 0.041475               0.004069  0.037405       0.013535     0.002030    0.013687
10-20               145       1055                 0.013793               0.000948  0.012845       0.009686     0.000947    0.009732
20-30               149       1051                 0.000000               0.000000  0.000000       0.000000     0.000000    0.000000
30-40               383        817                 0.010444               0.009792  0.000652       0.005195     0.003445    0.006233
40-50               354        846                 0.005650               0.005910 -0.000260       0.003984     0.002635    0.004776
50-60               423        777                 0.033097               0.029601  0.003496       0.008698     0.006080    0.010612
60-70               588        611                 0.132653               0.155483 -0.022830       0.013988     0.014660    0.020263
70-80               673        526                 0.178306               0.161597  0.016709       0.014755     0.016049    0.021801
80-90               881        318                 0.155505               0.261006 -0.105501       0.012209     0.024628    0.027488
90-100              938        261                 0.152452               0.333333 -0.180881       0.011737     0.029179    0.031451

I want to plot response_rate_treatment, response_rate_control, uplift by percentile (x axis) via a line chart with different color.我想通过不同颜色的折线图绘制 response_rate_treatment、response_rate_control、提升百分比(x 轴)。

I am trying the below code.我正在尝试下面的代码。 What mistake am I making that it is plotting a lot of charts instead of just 3 lines.我犯了什么错误,它绘制了很多图表,而不仅仅是 3 条线。

plt.figure(figsize=(20,15))


percentile = df_uplift_percentile.values

response_rate_treatment = df_uplift_percentile["response_rate_treatment"].values

response_rate_control = df_uplift_percentile["response_rate_control"].values

uplift= df_uplift_percentile["uplift"].values

plt.plot(percentile,response_rate_treatment,label= "Treatment Response Rate", color = 'green' )
plt.plot(percentile,response_rate_control,label = "Control Response Rate", color = 'yellow' )
plt.plot(percentile,uplift,label = "Uplift", color = 'red' )

plt.legend()
plt.ylabel("Uplift = Treatment Response Rate- Control Response Rate")

Current Plot Result当前绘图结果

在此处输入图片说明

Reproducible Data可重现的数据

  • Data dict数据字典
data =\
{'0-10': {'n_treatment': 217,
  'n_control': 983,
  'response_rate_treatment': 0.041475,
  'response_rate_control': 0.004069,
  'uplift': 0.037405,
  'std_treatment': 0.013535,
  'std_control': 0.00203,
  'std_uplift': 0.013687},
 '10-20': {'n_treatment': 145,
  'n_control': 1055,
  'response_rate_treatment': 0.013793,
  'response_rate_control': 0.000948,
  'uplift': 0.012845,
  'std_treatment': 0.009686,
  'std_control': 0.000947,
  'std_uplift': 0.009732},
 '20-30': {'n_treatment': 149,
  'n_control': 1051,
  'response_rate_treatment': 0.0,
  'response_rate_control': 0.0,
  'uplift': 0.0,
  'std_treatment': 0.0,
  'std_control': 0.0,
  'std_uplift': 0.0},
 '30-40': {'n_treatment': 383,
  'n_control': 817,
  'response_rate_treatment': 0.010444,
  'response_rate_control': 0.009792,
  'uplift': 0.000652,
  'std_treatment': 0.005195,
  'std_control': 0.003445,
  'std_uplift': 0.006233},
 '40-50': {'n_treatment': 354,
  'n_control': 846,
  'response_rate_treatment': 0.00565,
  'response_rate_control': 0.00591,
  'uplift': -0.00026,
  'std_treatment': 0.003984,
  'std_control': 0.002635,
  'std_uplift': 0.004776},
 '50-60': {'n_treatment': 423,
  'n_control': 777,
  'response_rate_treatment': 0.033097,
  'response_rate_control': 0.029601,
  'uplift': 0.003496,
  'std_treatment': 0.008698,
  'std_control': 0.00608,
  'std_uplift': 0.010612},
 '60-70': {'n_treatment': 588,
  'n_control': 611,
  'response_rate_treatment': 0.132653,
  'response_rate_control': 0.155483,
  'uplift': -0.02283,
  'std_treatment': 0.013988,
  'std_control': 0.01466,
  'std_uplift': 0.020263},
 '70-80': {'n_treatment': 673,
  'n_control': 526,
  'response_rate_treatment': 0.178306,
  'response_rate_control': 0.161597,
  'uplift': 0.016709,
  'std_treatment': 0.014755,
  'std_control': 0.016049,
  'std_uplift': 0.021801},
 '80-90': {'n_treatment': 881,
  'n_control': 318,
  'response_rate_treatment': 0.155505,
  'response_rate_control': 0.261006,
  'uplift': -0.105501,
  'std_treatment': 0.012209,
  'std_control': 0.024628,
  'std_uplift': 0.027488},
 '90-100': {'n_treatment': 938,
  'n_control': 261,
  'response_rate_treatment': 0.152452,
  'response_rate_control': 0.333333,
  'uplift': -0.180881,
  'std_treatment': 0.011737,
  'std_control': 0.029179,
  'std_uplift': 0.031451}}
  • The correct way to plot many columns as lines, is to use pandas.DataFrame.plot , which uses matplotlib as the default backend将许多列绘制为线的正确方法是使用pandas.DataFrame.plot ,它使用matplotlib作为默认后端
    • This reduces your plotting code from 10 lines to 2 lines.这将您的绘图代码从 10 行减少到 2 行。
  • 'percentile' is already the index, so any selected columns will be plotted with the index as the x-axis. 'percentile'已经是索引,因此任何选定的列都将以索引为 x 轴绘制。
    • If 'percentile' where a column, it would be passed to .plot as x='percentile' and it's position would need to be added to .iloc .如果'percentile'是一列,它将作为x='percentile'传递给.plot并且它的位置需要添加到.iloc
    • Use .iloc to select the columns by index, or use .loc to select the column by name.使用.iloc按索引选择列,或使用.loc按名称选择列。
    • Alternatively, pass the column names to the y parameter.或者,将列名传递给y参数。 With long column names, it's shorter to use .iloc .对于长列名,使用.iloc更短。
      • df_uplift_percentile.plot(y=[...], ...) to only plot certain columns df_uplift_percentile.plot(y=[...], ...)只绘制某些列
      • df_uplift_percentile.plot(...) to pass all columns to be plotted. df_uplift_percentile.plot(...)传递所有要绘制的列。
  • Changing the labels for the legend can be accomplished in two ways可以通过两种方式更改图例的标签
    1. Use .rename to change the column names使用.rename更改列名称
    2. Use .legend and pass a list for the labels (shown below)使用.legend并传递标签列表(如下所示)
  • Tested in python 3.8.12 , pandas 1.3.3 , matplotlib 3.4.3python 3.8.12pandas 1.3.3matplotlib 3.4.3
ax = df_uplift_percentile.iloc[:, [2, 3, 4]].plot(xticks=range(len(df_uplift_percentile)), figsize=(10, 6), color=['green', 'yellow', 'r'],
                                                  ylabel='Uplift = Treatment Response Rate- Control Response Rate')
ax.legend(['Treatment Response Rate', 'Control Response Rate', 'Uplift'])

在此处输入图片说明

Use:用:

percentile = df_uplift_percentile.index

instead of代替

percentile = df_uplift_percentile.values

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM