简体   繁体   English

在R中绘制每天和每月的购买计数

[英]Plot the count of purchase per day and month in R

The dataset represents which client (Cstid = Customer id) has made a purchase on which day. 数据集表示哪一个客户(Cstid =客户ID)在哪一天进行了购买。

I am facing difficulties finding a solution to plot the number of purchase per day and month. 我很难找到解决方案来绘制每天和每月的购买数量。

Please find below an example of the dataset, I have in total 7505 observations. 请在下面找到数据集的示例,我总共有7505个观测值。

  "Cstid"  "Date"
1  4195     19/08/17
2  3937     16/08/17
3  2163     07/09/17
4  3407     08/10/16
5  4576     04/11/16
6  3164     16/12/16
7  3174     18/08/15
8  1670     18/08/15
9  1671     18/08/15
10 4199     19/07/14
11 4196     19/08/14
12 6725     14/09/14
13 3471     14/09/13

I have started by converting the Date column : 我已经开始转换Date列:

 df$Date <- as.Date(df$Date, '%d/%m/%Y')

Then counted the number of observation per dates using : 然后使用以下方法计算每个日期的观察次数:

library(data.table)
dt <- as.data.table(df)
dt[,days:=format(Date,"%d.%m.%Y")]
dt1 <- data.frame(dt[,.N,by=days])

And tried to plot with : 并尝试绘制:

plot(dt1$days, dt1$N,type="l")

But i get the following error message : 但是我收到以下错误消息:

Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf

Could someone please inform how I should proceed? 有人可以告诉我该怎么办?

You need to specifiy a 2 digit year using %y (lower case) in order to convert the Date column from character to class Date . 您需要使用%y (小写)指定2位数年份,以便将Date列从字符转换为类Date

If ggplot2 is used for plotting, it will also do the aggregation. 如果ggplot2用于绘图,它也会进行聚合。 geom_bar() uses the count statistics by default. geom_bar()默认使用count统计信息。 This spares us to compute the aggregates (counts) beforehand. 这使我们无法预先计算聚合(计数)。

For aggregation by month, I recommend to map all dates to the first day of each month, eg, using lubridate::floor_date() . 对于按月汇总,我建议将所有日期映射到每个月的第一天,例如,使用lubridate::floor_date() This keeps a continuous scale on the x-axis. 这在x轴上保持连续比例。

So, the complete code would be: 所以,完整的代码将是:

# convert Date from character to class Date using a 2 digit year
df$Date <- as.Date(df$Date, '%d/%m/%y')

library(ggplot2)
# aggregate by day
ggplot(df) + aes(x = Date) + 
  geom_bar()

在此输入图像描述

#aggregate by month
ggplot(df) + aes(x = lubridate::floor_date(Date, "month")) + 
  geom_bar()

在此输入图像描述

Alternatively, the dates can be mapped to character month, eg, "2015-08" . 或者,日期可以映射到角色月,例如"2015-08" But this will turn the x-axis into a discrete scale which no longer shows the elapsed time between purchases: 但这会将x轴变为离散比例,不再显示购买之间的经过时间:

# aggregate by month using format() to create discrete scale
ggplot(df) + aes(x = format(Date, "%Y-%m")) + 
  geom_bar()

在此输入图像描述

#reproduciable data:
df <- data.frame(Cstid=c(4195,3937,2163,3407,4576,3164,3174,1670,1671,4199,4196,6725,3471),
           Date=c('19/08/17','16/08/17','07/09/17','08/10/16','04/11/16','16/12/16','18/08/15','18/08/15',
'18/08/15','19/07/14','19/08/14','14/09/14','14/09/13'))
#convert format:
df$Date <- as.character(df$Date)
Y <- paste('20',sapply(strsplit(df$Date,split = '/'),function(x){x[3]}),sep='')
M <- sapply(strsplit(df$Date,split = '/'),function(x){x[2]})
D <- sapply(strsplit(df$Date,split = '/'),function(x){x[1]})
df$Date <-  as.POSIXct(paste(Y,M,D,sep='-'),format='%Y-%m-%d')
#count per day plot:
days <- unique(df$Date)
dcount <- vector()
for (i in 1:length(days)) {
dcount[i]  <- nrow(df[df$Date==days[i],])
}
library(ggplot2)
ggplot(data=data.frame(days,dcount),aes(x=days,y=dcount))+geom_point()
#count per month plot:
df$month <- months(df$Date)
mon <- unique(df$month)
mcount <- vector()
for (i in 1:length(mon)) {
  mcount[i]  <- nrow(df[df$month==mon[i],])
}
ggplot(data.frame(mon,mcount),aes(x=mon,y=mcount))+geom_point()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM