[英]How to plot multiple lines from a dataframe
I have the following data:我有以下数据:
import pandas as pd
# using the data dict at the bottom of the question
df_uplift_percentile = pd.DataFrame.from_dict(data, 'index')
df_uplift_percentile.index.name = 'percentile'
# display(df_uplift_percentile)
n_treatment n_control response_rate_treatment response_rate_control uplift std_treatment std_control std_uplift
percentile
0-10 217 983 0.041475 0.004069 0.037405 0.013535 0.002030 0.013687
10-20 145 1055 0.013793 0.000948 0.012845 0.009686 0.000947 0.009732
20-30 149 1051 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
30-40 383 817 0.010444 0.009792 0.000652 0.005195 0.003445 0.006233
40-50 354 846 0.005650 0.005910 -0.000260 0.003984 0.002635 0.004776
50-60 423 777 0.033097 0.029601 0.003496 0.008698 0.006080 0.010612
60-70 588 611 0.132653 0.155483 -0.022830 0.013988 0.014660 0.020263
70-80 673 526 0.178306 0.161597 0.016709 0.014755 0.016049 0.021801
80-90 881 318 0.155505 0.261006 -0.105501 0.012209 0.024628 0.027488
90-100 938 261 0.152452 0.333333 -0.180881 0.011737 0.029179 0.031451
I want to plot response_rate_treatment, response_rate_control, uplift by percentile (x axis) via a line chart with different color.我想通过不同颜色的折线图绘制 response_rate_treatment、response_rate_control、提升百分比(x 轴)。
I am trying the below code.我正在尝试下面的代码。 What mistake am I making that it is plotting a lot of charts instead of just 3 lines.我犯了什么错误,它绘制了很多图表,而不仅仅是 3 条线。
plt.figure(figsize=(20,15))
percentile = df_uplift_percentile.values
response_rate_treatment = df_uplift_percentile["response_rate_treatment"].values
response_rate_control = df_uplift_percentile["response_rate_control"].values
uplift= df_uplift_percentile["uplift"].values
plt.plot(percentile,response_rate_treatment,label= "Treatment Response Rate", color = 'green' )
plt.plot(percentile,response_rate_control,label = "Control Response Rate", color = 'yellow' )
plt.plot(percentile,uplift,label = "Uplift", color = 'red' )
plt.legend()
plt.ylabel("Uplift = Treatment Response Rate- Control Response Rate")
data =\
{'0-10': {'n_treatment': 217,
'n_control': 983,
'response_rate_treatment': 0.041475,
'response_rate_control': 0.004069,
'uplift': 0.037405,
'std_treatment': 0.013535,
'std_control': 0.00203,
'std_uplift': 0.013687},
'10-20': {'n_treatment': 145,
'n_control': 1055,
'response_rate_treatment': 0.013793,
'response_rate_control': 0.000948,
'uplift': 0.012845,
'std_treatment': 0.009686,
'std_control': 0.000947,
'std_uplift': 0.009732},
'20-30': {'n_treatment': 149,
'n_control': 1051,
'response_rate_treatment': 0.0,
'response_rate_control': 0.0,
'uplift': 0.0,
'std_treatment': 0.0,
'std_control': 0.0,
'std_uplift': 0.0},
'30-40': {'n_treatment': 383,
'n_control': 817,
'response_rate_treatment': 0.010444,
'response_rate_control': 0.009792,
'uplift': 0.000652,
'std_treatment': 0.005195,
'std_control': 0.003445,
'std_uplift': 0.006233},
'40-50': {'n_treatment': 354,
'n_control': 846,
'response_rate_treatment': 0.00565,
'response_rate_control': 0.00591,
'uplift': -0.00026,
'std_treatment': 0.003984,
'std_control': 0.002635,
'std_uplift': 0.004776},
'50-60': {'n_treatment': 423,
'n_control': 777,
'response_rate_treatment': 0.033097,
'response_rate_control': 0.029601,
'uplift': 0.003496,
'std_treatment': 0.008698,
'std_control': 0.00608,
'std_uplift': 0.010612},
'60-70': {'n_treatment': 588,
'n_control': 611,
'response_rate_treatment': 0.132653,
'response_rate_control': 0.155483,
'uplift': -0.02283,
'std_treatment': 0.013988,
'std_control': 0.01466,
'std_uplift': 0.020263},
'70-80': {'n_treatment': 673,
'n_control': 526,
'response_rate_treatment': 0.178306,
'response_rate_control': 0.161597,
'uplift': 0.016709,
'std_treatment': 0.014755,
'std_control': 0.016049,
'std_uplift': 0.021801},
'80-90': {'n_treatment': 881,
'n_control': 318,
'response_rate_treatment': 0.155505,
'response_rate_control': 0.261006,
'uplift': -0.105501,
'std_treatment': 0.012209,
'std_control': 0.024628,
'std_uplift': 0.027488},
'90-100': {'n_treatment': 938,
'n_control': 261,
'response_rate_treatment': 0.152452,
'response_rate_control': 0.333333,
'uplift': -0.180881,
'std_treatment': 0.011737,
'std_control': 0.029179,
'std_uplift': 0.031451}}
pandas.DataFrame.plot
, which uses matplotlib
as the default backend将许多列绘制为线的正确方法是使用pandas.DataFrame.plot
,它使用matplotlib
作为默认后端
'percentile'
is already the index, so any selected columns will be plotted with the index as the x-axis. 'percentile'
已经是索引,因此任何选定的列都将以索引为 x 轴绘制。
'percentile'
where a column, it would be passed to .plot
as x='percentile'
and it's position would need to be added to .iloc
.如果'percentile'
是一列,它将作为x='percentile'
传递给.plot
并且它的位置需要添加到.iloc
。.iloc
to select the columns by index, or use .loc
to select the column by name.使用.iloc
按索引选择列,或使用.loc
按名称选择列。y
parameter.或者,将列名传递给y
参数。 With long column names, it's shorter to use .iloc
.对于长列名,使用.iloc
更短。
df_uplift_percentile.plot(y=[...], ...)
to only plot certain columns df_uplift_percentile.plot(y=[...], ...)
只绘制某些列df_uplift_percentile.plot(...)
to pass all columns to be plotted. df_uplift_percentile.plot(...)
传递所有要绘制的列。python 3.8.12
, pandas 1.3.3
, matplotlib 3.4.3
在python 3.8.12
、 pandas 1.3.3
、 matplotlib 3.4.3
ax = df_uplift_percentile.iloc[:, [2, 3, 4]].plot(xticks=range(len(df_uplift_percentile)), figsize=(10, 6), color=['green', 'yellow', 'r'],
ylabel='Uplift = Treatment Response Rate- Control Response Rate')
ax.legend(['Treatment Response Rate', 'Control Response Rate', 'Uplift'])
Use:用:
percentile = df_uplift_percentile.index
instead of代替
percentile = df_uplift_percentile.values
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.