向量化R中的for循环以创建具有不同长度的字符串

Question

I have created a sample R script to show my question: 我创建了一个示例R脚本来显示我的问题：

test.df <- data.frame(uid=c('x001','x002','x003'),
                      start_date=c('2015-01-02','2015-03-05','2015-07-09'),
                      end_date=c('2015-01-07','2015-03-07','2015-07-16'),
                      stringsAsFactors=FALSE) 
test.df[,'start_date'] <- as.Date(test.df[,'start_date']) 
test.df[,'end_date'] <- as.Date(test.df[,'end_date']) 
for (loop in (1:nrow(test.df))) {   
    test.df[loop,'output'] <- paste(seq(test.df[loop,'start_date'],test.df[loop,'end_date'],by = 1),collapse=';') 
}

I need to create strings of date with different length, I can only think of using for-loop for my problem, but I have about 70K cases that need to process the string, is there any way of speeding it up? 我需要创建具有不同长度的日期字符串，我只能考虑使用for-loop解决问题，但是我有大约70K个案例需要处理该字符串，有什么方法可以加快速度吗？

Update 01 更新01

Thanks @akrun for the answer, I have further modified my question as below: 感谢@akrun的回答，我进一步修改了我的问题，如下所示：

library(dplyr)

test.df <- data.frame(uid=c('x001','x002','x003'),
                      start_date=c('2015-01-02','2015-03-05','2015-07-09'),
                      end_date=c('2015-01-07','2015-03-07','2015-07-16'),
                      stringsAsFactors=FALSE)
test.df[,'start_date'] <- as.Date(test.df[,'start_date'])
test.df[,'end_date'] <- as.Date(test.df[,'end_date'])

# Part A
for (loop in (1:nrow(test.df))) {   
  test.df[loop,'output'] <- paste(seq(test.df[loop,'start_date'],test.df[loop,'end_date'],by = 1),collapse=';') 
}

# Part B
test.mod <- group_by(test.df,uid) %>%
  do({df <- data.frame(.)
  output.df <- data.frame(uid=df[1,'uid'],
                          date=unlist(strsplit(df[,'output'],';')))
  data.frame(output.df)
  })

Now Part A is fixed, but is there anyway to speed up Part B ? 现在， Part A是固定的，但是仍然有加快Part B速度吗？ Or should I combine Part A and Part B together? 还是应该将Part A Part B和Part B结合在一起？ Please enlighten me as data.table is new to me. 请启发我，因为data.table对我来说是新的。

Answer 1

We could convert the 'test.df' to 'data.table' ( setDT(test.df) ), grouped by 'uid', we get the seq of 'start_date', 'end_date' and the paste the elements together. 我们可以转换“test.df”到“data.table”（ setDT(test.df)由“UID”分组，我们得到的seq “起始日期”，“END_DATE”和中paste的元素结合在一起。

library(data.table)
setDT(test.df)[,paste(seq(start_date, end_date, by = '1 day'), collapse=';') , uid]

Update 更新资料

For the Part B, if we dont paste , it is a two column dataset 对于B部分，如果不paste ，则为两列数据集

setDT(test.df)[,seq(start_date, end_date, by = '1 day') , uid]

Answer 2

Here is how you can do it with apply 这是您可以通过Apply进行的方法

test.df <- data.frame(uid=c('x001','x002','x003'),
                      start_date=c('2015-01-02','2015-03-05','2015-07-09'),
                      end_date=c('2015-01-07','2015-03-07','2015-07-16'),
                      stringsAsFactors=FALSE) 

test.df$output <- apply(test.df, 1, function(x) paste(seq(as.Date(x[2]), as.Date(x[3]), by = 1), collapse=';'))

向量化R中的for循环以创建具有不同长度的字符串

问题描述

Update 01 更新01

2 个解决方案

解决方案1
2 已采纳 2015-10-29 04:11:23

Update 更新资料

解决方案2
0 2015-10-29 04:48:23

向量化R中的for循环以创建具有不同长度的字符串

问题描述

Update 01 更新01

2 个解决方案

解决方案1 2 已采纳 2015-10-29 04:11:23

Update 更新资料

解决方案2 0 2015-10-29 04:48:23

解决方案1
2 已采纳 2015-10-29 04:11:23

解决方案2
0 2015-10-29 04:48:23