在data.frames之间只匹配一次日期和id

Question

Have 2 example databases as follows 有两个示例数据库，如下所示

id<-c(1,2,3,1,4,3,5)
date<-c("2011-1-1","2011-1-1","2011-2-2","2012-3-3","2012-4-4","2012-5-5","2012-6-6")
d<-data.frame(cbind(id,date))
colnames(d)<-c("id","date")
d$w<-do.call(paste,c(d[c("id","date")],sep=" "))

id<-c(7,8,9,10,7,10,8,10,11,12)
date<-c("2011-1-1","2011-1-1","2011-2-2","2012-3-3","2012-3-3","2012-4-4","2012-4-4","2012-5-5","2012-6-6","2012-6-6")
contr<-data.frame(cbind(id,date))
colnames(contr)<-c("id","date")
contr$w<-do.call(paste,c(contr[c("id","date")],sep=" "))

Consider that id and dates are repeated in both datasets but d$id are all different from contr$id and that all contr$date are %in% d$date What I want is y that is a vector including ONE contr$w FOR EACH d$id that have a contr$date%in%d$date 考虑到两个数据集中都重复了id和date，但是d $ id与contr $ id不同，并且所有contr $ date是％in％d $ date我想要的是一个包含一个contr $ w FOR EACH的向量d $ id具有控制权的日期％in％d $ date

I have tried this which does not work but I am sure there must be a much easier,simpler=better way to do it. 我已经尝试了这种方法，但没有用，但我确信必须有一种更简单，更简单的更好方法。

y<-0
for(i in length(levels(factor(d$w)))){
   for(j in length(levels(factor(contr$w)))){
     z<-ifelse(d$date[i]==contr$date[j],contr$w[j],NA)
     y<-c(y,z)
     y<-subset(y,!is.na(y))
  }
}

Anyone can help? 有人可以帮忙吗？ Many thanks, Marco 非常感谢Marco

Answer 1

This did what I wanted, maybe I was not clear enough in my explanation. 这符合我的要求，也许我的解释不够清楚。 I just wanted a random date per id (then I can create the w column). 我只是想要每个id一个随机的日期（然后我可以创建w列）。 I have sorted this by using a solution from this other question: 我使用另一个问题的解决方案对此进行了排序：

Random row selection in R R中的随机行选择

Many t hanks for the effort anyway! 无论如何，非常感谢您的努力！ Marco 马可

Answer 2

Actually I have written now a loop that does this (the previous answer did not work as some cases in d did not have a matching date in contr). 实际上，我现在已经编写了一个执行此操作的循环（先前的答案无效，因为d中的某些情况在contr中没有匹配的日期）。 It is very slow but it does exactly what I wanted 速度很慢，但确实满足我的要求

for(i in 1:length(d$rownames)){
   if(TRUE%in%levels(factor(contr$w%in%d$w[i]))==TRUE){
       control.2$rownames[i]<-sample(contr$rownames[ctr$w==d$w[i]],1)
       contr<-contr[!contr$rownames%in%control.2$rownames[i],]
}else{
       z<-contr[contr$practice==d$practice[i],]
       z$tempo<-abs(difftime(z$date,d$date[i],units="days"))
       z<-z[!is.na(z$tempo),]
       z<-z[z$tempo==min(z$tempo),]
       control.2$rownames[i]<-sample(z$rownames,1)
       contr<-contr[!contr$rownames%in%control.2$rownames[i],]
  }
}

Not the best code I am sure, but it works. 我确定不是最好的代码，但是它可以工作。 The second look accounts for the few cases where there was no case with a matching date so I chose the sampled() one with the closest date. 第二种外观说明了没有匹配日期的案例的少数情况，因此我选择了具有最接近日期的sampled（）。 If you can come up with a faster version, that would be nice. 如果您可以提出一个更快的版本，那将是很好的。 My datasets are about d=~5K rows and contr=~2.5 million rows and it takes roughly 2 hours to run. 我的数据集大约是d =〜5K行和contr =〜250万行，大约需要2个小时才能运行。 Painful but worth the wait! 痛苦但值得等待！

在data.frames之间只匹配一次日期和id

问题描述

2 个解决方案

解决方案1
0 2014-06-23 10:12:11

解决方案2
0 已采纳 2014-06-30 21:14:30

在data.frames之间只匹配一次日期和id

问题描述

2 个解决方案

解决方案1 0 2014-06-23 10:12:11

解决方案2 0 已采纳 2014-06-30 21:14:30

解决方案1
0 2014-06-23 10:12:11

解决方案2
0 已采纳 2014-06-30 21:14:30