R时间序列多线图

Question

I have a very big dataset that I'd like to illustrate using plotly in R. A sample of my dataset is shown below:我有一个非常大的数据集，我想在 R 中使用 plotly 进行说明。我的数据集示例如下所示：

    > new_data_2
# Groups:   newdatum [8]
  date       activity       totaal
  <date>     <fct>          <int>
1 2019-11-21 N11            144
2 2019-09-22 N11            129
3 2019-05-15 N22            117
4 2019-01-23 N22            12
5 2019-07-04 N22            12
6 2019-07-18 N22            12
...

For every activity I want to display the amount (totaal) per date (date) in a time series plot.对于每个活动，我想在时间序列图中显示每个日期（日期）的数量（总计）。 Somehow I don't get it right in R. Somehow I need to group my activity to display, but I can't figure it out.不知何故，我在 R 中不正确。不知何故，我需要将我的活动分组以显示，但我无法弄清楚。

new_data_2 %>% 
group_by(activity) %>% 
plot_ly(x=new_data_2$newdatum) %>% 
add_lines(y=~new_data_2$totaal, color = ~factor(newdatum))

It does display an empty plot and not with the 'activity' on the left side.它确实显示了一个空图，而不是左侧的“活动”。

What i want to achieve is:我想要实现的是：

这个图片

Answer 1

You're on the right track, but after the group_by() you need to tell R to do something to the groups.你在正确的轨道上，但在group_by()你需要告诉 R 对组做一些事情。

new_data_2 %>%
  group_by(activity, date) %>%  # use two groupings since you want by activity & date 
  summarise(totaal_2 = sum(totaal))

That should get to the dataframe you're looking for.那应该到达您正在寻找的数据框。 You can use ggplot & plotly on it from there.您可以从那里使用 ggplot & plotly 。

I would recommend reshaping the data first (as above), saving it as a new object, and then graphing it.我建议先对数据进行整形（如上所述），将其另存为新对象，然后对其进行绘图。 Doing it this way helps you see each step along the way.这样做可以帮助您了解整个过程中的每一步。 Pipes %>% are great, but can make each step difficult to see.管道%>%很棒，但会使每一步都难以看到。

Answer 2

This might not be very obvious at first, but the structure of your data is ideal for plot with multiple time series.起初这可能不是很明显，但您的数据结构非常适合绘制多个时间序列。 You don't even need to worry with the group_by function.您甚至不需要担心group_by函数。 Your dataset seems to hava a long format where the dates in the date column and the names in activity column are not unique.您的数据集似乎采用长格式，其中date column中的date column和activity column中的名称不是唯一的。 But you will have only one variable per activity and date.但是每个活动和日期只有一个变量。

Given the correct specifications, plot_ly() will group your data using color=~activity like this: p <- plot_ly(new_data2, x = ~date, y = ~totall, color = ~activity) %>% add_lines() .给定正确的规格， plot_ly()将使用color=~activity p <- plot_ly(new_data2, x = ~date, y = ~totall, color = ~activity) %>% add_lines()您的数据进行分组，如下所示： p <- plot_ly(new_data2, x = ~date, y = ~totall, color = ~activity) %>% add_lines() 。 Since you haven't provided a data sample that is large enough, I'll use the built-in dataset economics_long to show you how you can do this.由于您尚未提供足够大的数据样本，我将使用内置数据集economics_long向您展示如何执行此操作。 First of all, notice how the structure of my sampled dataset matches yours:首先，请注意我的采样数据集的结构如何与您的相匹配：

           date variable value
1    1967-07-01  psavert  12.5
2    1967-08-01  psavert  12.5
3    1967-09-01  psavert  11.7
4    1967-10-01  psavert  12.5
5    1967-11-01  psavert  12.5
6    1967-12-01  psavert  12.1
...

Plot:阴谋：

Code:代码：

library(plotly)
library(dplyr)

# data
data("economics_long")
df <- data.frame(economics_long)

# keep only some variables that have values on a comparable level
df <- df %>% filter(!(variable %in% c('pop', 'pce', 'unemploy')))

# plotly time series
p <- plot_ly(df, x = ~date, y = ~value, color = ~variable) %>%
  add_lines()

# show plot
p

R时间序列多线图

问题描述

2 个解决方案

解决方案1
1 2020-01-08 16:23:25

解决方案2
0 2020-01-09 12:35:15

R时间序列多线图

问题描述

2 个解决方案

解决方案1 1 2020-01-08 16:23:25

解决方案2 0 2020-01-09 12:35:15

解决方案1
1 2020-01-08 16:23:25

解决方案2
0 2020-01-09 12:35:15