简体   繁体   English

在单独的数据框中从一个数据框继续行号

[英]Continue row numbers from one data frame on separate data frame

I have two data frames, df1 and df2:我有两个数据框,df1 和 df2:

df1<-structure(list(protocol_no = c("study5", "study5", 
"study5", "study5", "study5", "study5","study6"), 
    sequence_number = c("1", "15", "73", "42", "2", "9","5021")), row.names = c(NA, 
-7L), class = c("tbl_df", "tbl", "data.frame"))

df2<-structure(list(record_id = c(11, 12, 13, 14, 15, 16), protocol_no = c("study5", 
"study5", "study5", "study5", "study5", "study5"
), sequence_number = c("1", "15", "73", "42", "2", "9"), form_1_complete = c(0, 
0, 0, 0, 0, 0)), row.names = c(NA, 6L), class = "data.frame")

You can kinda ignore whats in these, i just made up some names and numbers, the key points here are that df2 has more columns than df1, and the real data sets will have 27,000+ rows.您可以忽略其中的内容,我只是编造了一些名称和数字,这里的关键点是 df2 的列数比 df1 多,真实数据集将有 27,000 多行。

df1 will always have slightly more rows than df2 because it has newer data. df1 的行数总是比 df2 多一些,因为它有更新的数据。

What I'm trying to do is find which rows exist in df1 that don't exist in df2 and isolate them.我想要做的是找到 df1 中存在但 df2 中不存在的行并将它们隔离。 I know I could do this with anti-join(), the problem is that I also want to include the "record_id" column from df2, and I want it to start numbering from wherever df2 left off.我知道我可以用 anti-join() 做到这一点,问题是我还想包括 df2 的“record_id”列,我希望它从 df2 停止的地方开始编号。

So in this case, the row of df1 that is "study 6, 5021" would be the 'new' row, and it would be numbered record_id = 17 (because thats where df2 left off), and my output would look like this:所以在这种情况下,df1 的“study 6, 5021”行将是“新”行,编号为 record_id = 17(因为那是 df2 停止的地方),而我的 output 将如下所示:

在此处输入图像描述

We could bind the data, get the distinct rows and update the record_id我们可以绑定数据,获取distinct的行并更新record_id

library(dplyr)
library(tidyr)
library(data.table)
bind_rows(df2, df1) %>% 
  distinct(protocol_no, sequence_number, .keep_all = TRUE) %>% 
  fill(record_id, form_1_complete) %>% 
  mutate(record_id = record_id +  (rowid(record_id) - 1))

-output -输出

  record_id protocol_no sequence_number form_1_complete
1        11      study5               1               0
2        12      study5              15               0
3        13      study5              73               0
4        14      study5              42               0
5        15      study5               2               0
6        16      study5               9               0
7        17      study6            5021               0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM