简体   繁体   English

在R中按顺序合并成对的数据帧

[英]Merging pairs of data frames in sequence in R

I have a data frame that contains tagged individuals at multiple sites from multiple sampling intervals. 我有一个数据框,其中包含来自多个采样间隔的多个地点的带标签的个人。 See example below: 请参见下面的示例:

> df
   Tag   Site Interval Ind_ID
1  507 Golden        7      1
2  507 Golden        8      1
3  552 Golden        2      1
4  552 Golden        1      1
5  847 Golden        4      1
6  847 Golden        6      1
8  847 Golden        5      1
9  847 Golden        3      1
31 541 Golden        1      1
33 541 Golden        3      1
34 541 Golden        4      1
35 541 Golden        7      1
36 541 Golden        6      1
37 541 Golden        5      1
39 810 Golden        7      1
40 810 Golden        8      1
41 840 Golden        7      1
42 840 Golden        8      1
43 840 Golden        3      1
44 840 Golden        2      1

What I'm trying to do is separate tagged individuals by interval, which I've done using this for loop: 我想做的是按时间间隔分隔带标签的个人,这是我使用此for循环完成的:

for (i in 1:nlevels(factor(df$Interval))){
  I<-subset(df,Interval==levels(factor(df$Interval))[i])
  assign(paste("Interval_", i, sep = ""), I)}

And then merge data frames by pairs in sequence, which I'm currently doing using this code: 然后按顺序成对合并数据帧,我目前正在使用此代码进行操作:

IPl2<-merge(Interval_1, Interval_2, by=c("Tag", "Site", "Ind_ID"))
IPl3<-merge(Interval_2, Interval_3, by=c("Tag", "Site", "Ind_ID"))
IPl4<-merge(Interval_3, Interval_4, by=c("Tag", "Site", "Ind_ID"))
IPl5<-merge(Interval_4, Interval_5, by=c("Tag", "Site", "Ind_ID"))
IPl6<-merge(Interval_5, Interval_6, by=c("Tag", "Site", "Ind_ID"))
IPl7<-merge(Interval_6, Interval_7, by=c("Tag", "Site", "Ind_ID"))
IPl8<-merge(Interval_7, Interval_8, by=c("Tag", "Site", "Ind_ID"))

I'm sure there's a more efficient way of doing this. 我敢肯定有一种更有效的方法。 Also, I'm continually adding data to the data set (ie more intervals), and I would like to avoid having to edit the code each time new data is added. 另外,我一直在将数据不断添加到数据集(即更多的间隔),并且我希望避免每次添加新数据时都必须编辑代码。 Any ideas? 有任何想法吗?

Maybe something like this: 也许是这样的:

dfs <- split(df,df$Interval)
n <- nlevels(factor(df$Interval))-1
results <- setNames(vector("list",length = n),paste0("IPl",2:(n+1)))
for (i in seq_len(n)){
    results[[i]] <- merge(dfs[[i]],dfs[[i+1]],by = c('Tag','Site','Ind_ID'))
}

> head(results)

$IPl2
  Tag   Site Ind_ID Interval.x Interval.y
1 552 Golden      1          1          2

$IPl3
  Tag   Site Ind_ID Interval.x Interval.y
1 840 Golden      1          2          3

$IPl4
  Tag   Site Ind_ID Interval.x Interval.y
1 541 Golden      1          3          4
2 847 Golden      1          3          4

$IPl5
  Tag   Site Ind_ID Interval.x Interval.y
1 541 Golden      1          4          5
2 847 Golden      1          4          5

$IPl6
  Tag   Site Ind_ID Interval.x Interval.y
1 541 Golden      1          5          6
2 847 Golden      1          5          6

$IPl7
  Tag   Site Ind_ID Interval.x Interval.y
1 541 Golden      1          6          7

Below is a dplyr solution that joins the data frame with itself and puts the results in a data frame. 以下是dplyr解决方案,该解决方案将数据框与其自身连接在一起,并将结果放入数据框。

library(dplyr)
## Join the 'df' to itself based on the intervals to compare; this is done by
## creating a key to indicate which intervals to join on.
resultdf <-
    ## Create match_interval to next sequential value
    df %>% mutate(match_interval = paste0('IPl', as.numeric(Interval)+1)) %>% arrange(Interval, Site) %>%
    ## Join to self by match_interval and other columns.
    inner_join(df %>% mutate(match_interval = paste0('IPl', as.numeric(Interval))),
               by = c('Tag', 'Site', 'Ind_ID', 'match_interval')) %>%
    ## Order columns
    select(match_interval, Tag, Site, Ind_ID, Interval.x, Interval.y)


resultsdf

##    match_interval Tag   Site Ind_ID Interval.x Interval.y
## 1            IPl2 552 Golden      1          1          2
## 2            IPl3 840 Golden      1          2          3
## 3            IPl4 847 Golden      1          3          4
## 4            IPl4 541 Golden      1          3          4
## 5            IPl5 847 Golden      1          4          5
## 6            IPl5 541 Golden      1          4          5
## 7            IPl6 847 Golden      1          5          6
## 8            IPl6 541 Golden      1          5          6
## 9            IPl7 541 Golden      1          6          7
## 10           IPl8 507 Golden      1          7          8
## 11           IPl8 810 Golden      1          7          8
## 12           IPl8 840 Golden      1          7          8

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM