简体   繁体   English

如何使用 R 进行 Plot 折线图进行时间序列分析

[英]How to Plot line chart using R for time-series analysis

I am trying to plot a line chart using Date-time and no of tweets at that period of date and time in R.我正在尝试在 R 中使用日期时间和在该日期和时间段内没有推文的折线图 plot。

library(ggplot2)
df1 <- structure(list(Date = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Label = c("2020-03-12", 
            "2020-03-13"), class = "factor"), Time = structure(c(1L, 1L, 2L, 
            3L, 4L, 5L), .Label = c("00:00:00Z", "00:00:01Z", "00:10:04Z", 
            "00:25:12Z", "01:00:02Z"), class = "factor"), Text = structure(c(5L, 
            3L, 6L, 4L, 2L, 1L), .Label = c("The images of demonstrations and gathering", "Premium policy get activate by company abc", 
            "Launches of rocket", "Premium policy get activate by company abc", 
            "Technology makes trend", "The images of demonstrations and gatherings", 
            "Weather forecasting by xyz"), class = "factor")), class = "data.frame", row.names = c(NA, 
            -6L))
ggplot(df1, aes(x = Date, y = text(count)) + geom_line(aes(color = variable), size = 1)

I tried the above code to plot desired result but got an error.我将上面的代码尝试到 plot 所需的结果,但出现错误。 Dataset given like that in csv format.以 csv 格式给出的数据集。

Date         Time                     Text
2020-03-12   00:00:00Z                The images of demonstrations and gatherings
2020-03-12   00:00:00Z                Premium policy get activate by company abc
2020-03-12   00:00:01Z                Weather forecasting by xyz 
2020-03-12   00:10:04Z                Technology makes trend
2020-03-12   00:25:12Z                Launches of rocket 
2020-03-12   01:00:02Z                Government launch new policy to different sector improvement

I have a dataset of nearly 15 days and want to plot the line chart to visualize the number of tweets (given in text column) to see the trend of tweets on different time and date.我有一个近 15 天的数据集,想用 plot 的折线图来可视化推文的数量(在文本列中给出),以查看不同时间和日期的推文趋势。

df1 <- structure(list(Date = structure(c(1L, 1L, 2L, 1L, 1L, 1L), .Label = c("3/12/2020", 
            "3/13/2020"), class = "factor"), Time = structure(c(1L, 1L, 2L, 
            3L, 4L, 5L), .Label = c("00:00:00Z", "00:00:01Z", "00:10:04Z", 
            "00:25:12Z", "01:00:02Z"), class = "factor"), Text = structure(c(5L, 
            3L, 6L, 4L, 2L, 1L), .Label = c("Government launch new policy to different sector", 
            "Launches of rocket", "Premium policy get activate by company abc", 
            "Technology makes trend", "The images of demonstrations and gatherings", 
            "Weather forecasting by xyz"), class = "factor"), X = structure(c(1L, 
            1L, 1L, 1L, 1L, 2L), .Label = c("", "improvement"), class = "factor")), class = "data.frame", row.names = c(NA, 
            -6L))                                                      

Creating the dataset df1 as above then running this gives you required plot for hour如上所述创建数据集 df1 然后运行它会为您提供所需的 plot 小时

library(tidyverse)
library(lubridate)

df1 %>% 
  mutate(Time=hms(Time),
         Date=mdy(Date),
    hour=hour(Time)) %>% 
  count(hour) %>% 
  ggplot(aes(hour,n,group=1))+geom_line()+geom_point()

Is this what you are after?这就是你所追求的吗?


library(dplyr)
library(lubridate)
library(stringr)
library(ggplot2)

Answer with your data用你的数据回答

To demonstrate data wrangling.演示数据争论。


# your data; 
df1 <- structure(list(Date = structure(c(1L, 1L, 2L, 1L, 1L, 1L), 
                                       .Label = c("2020-03-12","2020-03-13"), 
                                       class = "factor"), 
                      Time = structure(c(1L, 1L, 2L,3L, 4L, 5L), 
                                       .Label = c("00:00:00Z", "00:00:01Z", "00:10:04Z","00:25:12Z", "01:00:02Z"),
                                       class = "factor"),
                      Text = structure(c(5L,3L, 6L, 4L, 2L, 1L),
                                       .Label = c("The images of demonstrations and gathering", "Premium policy get activate by company abc",
                                                  "Launches of rocket", "Premium policy get activate by company abc",
                                                  "Technology makes trend", "The images of demonstrations and gatherings", "Weather forecasting by xyz"), class = "factor")),
                 class = "data.frame", row.names = c(NA,-6L))

# data wrangle
df2 <- 
  df1 %>% 
  # change all variables from factors to character
  mutate_all(as.character) %>%
  mutate(Time = str_remove(Time, "Z$"), #remove the trailing 'Z' from Time values 
         dt = ymd_hms(paste(Date, Time, sep = " ")), # change text into datetime format using lubridtate::ymd_hms
         dt = ceiling_date(dt, unit="hour")) %>% # round to the end of the named hour, separated for clarity
  group_by(dt) %>%  
  summarise(nr_tweets = n())

# plot

p1 <- ggplot(df2, aes(dt, nr_tweets))+
        geom_line()+
        scale_x_datetime(date_breaks = "1 day", date_labels = "%d/%m")+
        ggtitle("Data from question `df1`")


Answer with made up large dataset用组成的大型数据集回答

tib <- tibble(dt = sample(seq(ISOdate(2020,05,01), ISOdate(2020,05,15), by = "sec"), 10000, replace = TRUE),
             text = sample(c(letters[1:26], LETTERS[1:26]), 10000, replace = TRUE))


tib1 <- 
  tib %>% 
  mutate(dt = round_date(dt, unit="hour"))%>% 
  group_by(dt) %>%  
  summarise(nr_tweets = n())


p2 <- ggplot(tib1, aes(dt, nr_tweets))+
        geom_line()+
        scale_x_datetime(date_breaks = "1 day", date_labels = "%d/%m")+
        ggtitle("Result using `tib` data made up to answer the question")
  

p1/p2

Created on 2020-05-13 by the reprex package (v0.3.0)reprex package (v0.3.0) 于 2020 年 5 月 13 日创建

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM