[英]return sample date between min date and max date in R in a dataframe
如何在數據框中的R中的最小日期和最大日期之間返回采樣日期作為附加列?
Course MinEnrollmentDate MaxEnrollmentDate
Maths 3/11/2016 3/4/2016
Chemistry 6/11/2016 6/4/2016
Physics 9/11/2016 9/4/2016
English 12/11/2016 12/4/2016
Science 3/11/2017 3/4/2017
使用dplyr
我們可以做到:
library(dplyr)
df <- df %>%
rowwise() %>%
mutate(MinEnrollmentDate = as.Date(MinEnrollmentDate, format = '%m/%d/%Y'),
MaxEnrollmentDate = as.Date(MaxEnrollmentDate, format = '%m/%d/%Y'),
sampleDate = sample(seq(MinEnrollmentDate, MaxEnrollmentDate, '-1 day'), 1))
df
#> Source: local data frame [5 x 4]
#> Groups: <by row>
#>
#> # A tibble: 5 x 4
#> Course MinEnrollmentDate MaxEnrollmentDate sampleDate
#> <chr> <date> <date> <date>
#> 1 Maths 2016-03-11 2016-03-04 2016-03-08
#> 2 Chemistry 2016-06-11 2016-06-04 2016-06-09
#> 3 Physics 2016-09-11 2016-09-04 2016-09-06
#> 4 English 2016-12-11 2016-12-04 2016-12-09
#> 5 Science 2017-03-11 2017-03-04 2017-03-06
不知道我的日期格式是否正確,是否模棱兩可,請隨時更正format=
部分。 數據:
df <- read.table(text = 'Course MinEnrollmentDate MaxEnrollmentDate
Maths 3/11/2016 3/4/2016
Chemistry 6/11/2016 6/4/2016
Physics 9/11/2016 9/4/2016
English 12/11/2016 12/4/2016
Science 3/11/2017 3/4/2017', header = T, stringsAsFactors = F)
您可以計算兩個日期之間的天數:
days <- as.Date(data$MinEnrollmentDate, format="%d/%m/%Y") - as.Date(data$MaxEnrollmentDate, format="%d/%m/%Y")
然后添加對MinEnrollmentDate
1之間和天到數的隨機數MaxEnrollmentDate
與功能sample()
for(i in seq_along(days)) {
data[i,4] <- as.character(as.Date(data$MinEnrollmentDate, format="%d/%m/%Y")[i] + sample(1:days[i],1))
}
假設您正在使用名為mydata的數據框,則可以使用以下代碼段:
mydata$sampledate <- sample(seq(as.Date(mydata$MinEnrollmentDate), as.Date(mydata$MinEnrollmentDate), by="day"), 1)
基本上,這是首先生成一個從開始日期到結束日期的全天序列,然后從該序列中隨機抽取一個大小為1的樣本,並將其寫入您的數據框。
為了完整起見,請逐步lubridate
解決方案(使用GGamba的df):
if (!require(lubridate)) {
install.packages("lubridate")
}
df <- read.table(text = 'Course MinEnrollmentDate MaxEnrollmentDate
Maths 3/11/2016 3/4/2016
Chemistry 6/11/2016 6/4/2016
Physics 9/11/2016 9/4/2016
English 12/11/2016 12/4/2016
Science 3/11/2017 3/4/2017', header = T, stringsAsFactors = F)
no_days <- as.POSIXct(df$MinEnrollmentDate, format = "%d/%m/%Y") - as.POSIXct(df$MaxEnrollmentDate, format = "%d/%m/%Y")
random_days <- sapply(no_days, function(x) sample(x = 1:x, size = 1, replace = T) )
df$random_date <- as.POSIXct(df$MinEnrollmentDate, format = "%d/%m/%Y") + days(random_days)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.