Plot 多个 dataframe 在 plot with facet_wrap

Question

我有一个看起来像这样的数据集df ：

ID      Week    VarA    VarB    VarC    VarD
s001    w1      2       5       4       7
s001    w2      4       5       2       3
s001    w3      7       2       0       1
s002    w1      4       0       9       8
s002    w2      1       5       2       5
s002    w3      7       3       6       0
s001    w1      6       5       7       9
s003    w2      2       0       1       0
s003    w3      6       9       3       4

对于每个 ID，我正在尝试 plot 其所有 Var (VarB、VarC、VarD) 的周进度，并以 VarA 作为参考数据。

我执行df.melt()并在下面运行编码并且它有效。

ID     Week  Var  Value
s001    w1  VarA    2
s001    w2  VarA    4
s001    w3  VarA    7
s002    w1  VarA    4
s002    w2  VarA    1
s002    w3  VarA    7
s001    w1  VarA    6
s003    w2  VarA    2
s003    w3  VarA    6
s001    w1  VarB    5
s001    w2  VarB    5
...

代码：

for id in idlist:

#get VarA into new df
newdf= df_melt[df_melt.Var == 'VarA']

#remove rows with VarA so it won't be included in facet_wrap()  
tmp = df_melt[df_melt.Var != 'VarA']

plot2 = ggplot() + ggtitle(id) + labs(x='Week',y="Value") \
    + geom_point(newdf[newdf['ID'] == id], aes(x='Week',y='Value')) \
        + geom_point(tmp[tmp['ID'] == id], aes(x='Week',y='Value',color='Var')) \
           + theme(axis_text_x=element_text(rotation=45))

print(plot2)

但是，当我添加facet_wrap('Var', ncol=3,scales='free')时，出现以下错误

IndexError: arrays used as indices must be of integer (or boolean) type

而且我也无法使用geom_line()连接线路。

这是我预期的 output：

这是因为使用了不同的df吗？ 有没有办法在一个 ggplot object 中对不同的 df 和facet_wrap使用多个geom_point() ？

Answer 1

该问题的问题是将由以下代码重现的错误。 该错误已修复，下一个版本的 plotnine 将进行修复。

import pandas as pd
from plotnine import *

df1 = pd.DataFrame({
    'x': list("abc"),
    'y': [1, 2, 3],
    'g': list("AAA")

})

df2 = pd.DataFrame({
    'x': list("abc"),
    'y': [4, 5, 6],
    'g': list("AAB")
})

(ggplot(aes("x", "y"))
 + geom_point(df1)
 + geom_point(df2)
 + facet_wrap("g", scales="free_x")
)

Answer 2

除了@has2k1 提到的修复错误之外，我还找到了通过将Var的列名重命名为其他名称来添加参考数据点VarA的解决方案，这样两个df就不会具有相同的列名，并且允许facet_wrap仅在其中一个df上工作。

for pt in idlist:
    #get VarA into new df
    newdf = df_melt[df_melt.Var == 'VarA']
    newdf.rename(columns = {'Var':'RefVar'},inplace=True)

    #remove rows with VarA so it won't be included in facet_wrap() 
    tmp = df_melt[df_melt.Var != 'VarA']

    plot2 = ggplot() \
        + geom_point(tmp[tmp['ID'] == pt],aes(x='Week',y='Value',color='Var')) \
        + facet_wrap('Var',ncol=1,scales='free') \
        + geom_point(newdf[newdf['ID'] == pt],aes(x='Week',y='Value'))  \
        + labs(x='Week',y='Value') + ggtitle(pt) + theme(axis_text_x=element_text(rotation=45),subplots_adjust={'hspace': 0.6})

    print(plot2)

Plot 多个 dataframe 在 plot with facet_wrap

问题描述

2 个解决方案

解决方案1
2 已采纳 2022-11-12 10:46:27

解决方案2
0 2022-11-14 07:01:42

Plot 多个 dataframe 在 plot with facet_wrap

问题描述

2 个解决方案

解决方案1 2 已采纳 2022-11-12 10:46:27

解决方案2 0 2022-11-14 07:01:42

解决方案1
2 已采纳 2022-11-12 10:46:27

解决方案2
0 2022-11-14 07:01:42