ggplot2的多个时间序列

Question

我需要做一些绘图工作，而且我一直在学习使用ggplot2，但是我还不太清楚如何使它与正在使用的数据集一起使用。 我无法在此处发布实际数据，但可以举一个简短的例子说明它的样子。 我有两个主要的数据框； 一个包含各个公司的季度总收入，另一个包含每个公司内各个部门的季度收入。 例如：

Quarter, CompA, CompB, CompC...
2011.0, 1, 2, 3...
2011.25, 2, 3, 4...
2011.5, 3, 4, 5...
2011.75, 4, 5, 6...
2012.0, 5, 6, 7...

和

Quarter, CompA_Footwear, CompA_Apparel, CompB_Wholesale...
2011.0, 1, 2, 3...
2011.25, 2, 3, 4...
2011.5, 3, 4, 5...
2011.75, 4, 5, 6...
2012.0, 5, 6, 7...

我一直在构建的脚本在第一个表中的每个公司中循环，并使用select（）来获取第二个表中的所有列，因此出于此问题的目的，请忽略其他公司并假定第一个表为只是CompA，第二个表是所有不同的CompA段。

我要为每个细分受众群创建一个折线图，其中应包含公司总收入和随着时间推移绘制的细分受众群收入。 像这样的样子。 理想情况下，我希望能够使用facet_wrap（）或某种功能能够一次为每个分段制作所有不同的图形，但这并不是绝对必要的。 为了明确起见，每个单独的图应该只有两行：整个公司和一个特定的细分。

我必须以任何必要的方式重组我的数据。 有谁知道我该怎么做？

Answer 1

我认为以下应该可行。 请注意，您需要适当地移动数据。

# Load packages
library(dplyr)
library(ggplot2)
library(reshape2)
library(tidyr)

制作可复制的数据集：

# Create companies
# Could pull this from column names in your data
companies <- paste0("Comp",LETTERS[1:4])

set.seed(12345)

sepData <-
  lapply(companies, function(thisComp){
    nDiv <- sample(3:6,1)
    temp <- 
      sapply(1:nDiv,function(idx){
        round(rnorm(24, rnorm(1,100,25), 6))
      }) %>%
      as.data.frame() %>%
      setNames(paste(thisComp,sample(letters,nDiv), sep = "_"))
  }) %>%
  bind_cols()

sepData$Quarter <-
  rep(2010:2015
      , each = 4) +
  (0:3)/4

meltedSep <-
  melt(sepData, id.vars = "Quarter"
       , value.name = "Revenue") %>%
  separate(variable
           , c("Company","Division")
           , sep = "_") %>%
  mutate(Division = factor(Division
                           , levels = c(sort(unique(Division))
                                        , "Total")))

fullCompany <-
  meltedSep %>%
  group_by(Company, Quarter) %>%
  summarise(Revenue = sum(Revenue)) %>%
  mutate(Division = factor("Total"
                           , levels = levels(meltedSep$Division)))

您说要的情节在这里。 请注意，您需要设置Divison = NULL来防止总数显示在其自身的构面中：

theme_set(theme_minimal())

catch <- lapply(companies, function(thisCompany){
  tempPlot <-
    meltedSep %>%
    filter(Company == thisCompany) %>%
    ggplot(aes(y = Revenue
               , x = Quarter)) +
    geom_line(aes(col = "Division")) +
    facet_wrap(~Division) +
    geom_line(aes(col = "Total")
              , fullCompany %>%
                filter(Company == thisCompany) %>%
                mutate(Division = NULL)
              ) +
    ggtitle(thisCompany) +
    scale_color_manual(values = c(Division = "darkblue"
                                  , Total = "green3"))
  print(tempPlot)
})

输出示例：

但是请注意，这看起来有点可怕。 “总计”与任何一个部门之间的差异总是很大的。 相反，您可能只想在一个图上绘制所有划分：

allData <-
  bind_rows(meltedSep, fullCompany)

catch <- lapply(companies, function(thisCompany){
  tempPlot <-
    allData %>%
    filter(Company == thisCompany) %>%
    ggplot(aes(y = Revenue
               , x = Quarter
               , col = Division)) +
    geom_line() +
    ggtitle(thisCompany)
    # I would add manual colors here, assigned so that, e.g. "Clothes" is always the same
  print(tempPlot)
})

例：

总数与两者之间的差异仍然很大，但至少您可以比较这些划分。

如果是我的话，我可能会作两个图。 每个公司的每个部门（多面的）一个，总计的一个：

meltedSep %>%
  ggplot(aes(y = Revenue
             , x = Quarter
             , col = Division)) +
  geom_line() +
  facet_wrap(~Company)

fullCompany %>%
  ggplot(aes(y = Revenue
             , x = Quarter
             , col = Company)) +
  geom_line()

Answer 2

我可以考虑使用facet_wrap()做到这两种方法，还有两种方法：

在ggplot2使用annotate() （简单方法）
为每个公司加倍您的数据帧（仍然相对简单，只是更容易出错）

无论哪种方式，让我们重新创建两个数据框，以便我们可以重现您的示例：

首先创建“公司总收入”数据框：

Quarter <- seq(2011, 2012, by = .25)
CompA <- as.integer(runif(5, 5, 15))
CompB <- as.integer(runif(5, 6, 16))
CompC <- as.integer(runif(5, 7, 17))
df1 <- data.frame(Quarter, CompA, CompB, CompC)

接下来，公司A的“细分收入”数据框：

CompA_Footwear <- as.integer(runif(5, 0, 5))
CompA_Apparel <- as.integer(runif(5,1 , 6))
CompA_Wholesale <- as.integer(runif(5, 2, 7))
df2 <- data.frame(Quarter, CompA_Footwear, CompA_Apparel, CompA_Wholesale)

现在，我们将重新arrage您的数据更多的东西识别为ggplot2使用melt()从reshape2

require(reshape2)
melt.df1 <- melt(df1, id = "Quarter")
melt.df2 <- melt(df2, id = "Quarter")
df <- rbind(melt.df1, melt.df2)

我们现在几乎可以绘制图表了。 例如，我只关注“公司A”

使用`annotate()`

子集数据，使其仅包含公司A的“细分收入”

CompA.df2 <- df[grep("CompA_", df$variable),]

假设您所有的细分收入均以“ CompA_ *”开头进行编码。 您将不得不根据您的数据进行子集化。

现在绘制：

require(ggplot2)
ggplot(data = CompA.df2, aes(x = Quarter, y = value,
                            group = variable, colour = variable)) +
  geom_line() +
  geom_point() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  facet_wrap(~variable) + # Facets by segment
  # Next, adds the total revenue data as an annotation
  annotate(geom = "line", x = Quarter, y = df1$CompA) + 
  annotate(geom = "point", x = Quarter, y = df1$CompA)

基本上，我们只是用来自公司A的原始“公司总收入”数据框的线和点对图形进行注释。主要缺点是缺少图例。

第二种方法将为所有值生成图例

复制数据

facet_wrap（）的工作方式，我们需要为每个构面上的每个预期绘制线定义相同的构面变量。 因此，我们将复制每个“细分市场收入”级别的总收入，并将每个对组合在一起。

使用与上述相同的数据框，我们将分离出公司A的总收入和公司A的部门收入

CompA.df1 <- df[which(df$variable == "CompA"),] # Total Company A Revenue
CompA.df2 <- droplevels(df[grep("CompA_", df$variable),]) # Segment Revenue of Company A

现在，根据“细分收入”所具有的级别，重复公司A的总收入数据框架

rep.CompA.df1 <- CompA.df1[rep(seq_len(nrow(CompA.df1)), nlevels(CompA.df2$variable)), ]

如果您有NA's或NaN's则可能会出现错误

现在合并重复的数据帧，并添加一个facet变量（此处为facet.var）以将它们配对。

CompA.df3 <- rbind(rep.CompA.df1, CompA.df2)
CompA.df3$facet.var <- rep(CompA.df2$variable,2)

现在您可以进行绘图了。 您仍然可以定义group = variable ，但是这次我们将facet_wrap()设置为我们新创建的facet.var

require(ggplot2)
ggplot(data = CompA.df3, aes(x = Quarter, y = value,
                             group = variable, colour = variable)) +
  geom_line() +
  geom_point() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
  facet_wrap(~facet.var)

如您所见，我们现在将“总收入”添加到图例中：

那段情节真漂亮

ggplot2的多个时间序列

问题描述

2 个解决方案

解决方案1
1 已采纳 2016-07-07 19:37:19

解决方案2
1 2016-07-07 20:34:55

使用`annotate()`

复制数据

ggplot2的多个时间序列

问题描述

2 个解决方案

解决方案1 1 已采纳 2016-07-07 19:37:19

解决方案2 1 2016-07-07 20:34:55

使用annotate()

复制数据

解决方案1
1 已采纳 2016-07-07 19:37:19

解决方案2
1 2016-07-07 20:34:55

使用`annotate()`