繁体   English   中英

根据顺序范围 R 组合列

[英]Combine columns based on sequential range R

我有两个要合并的数据框。

Jak
1
2
3
4
5
6
7
8
9
10


Start    Stop    ID    Info
1        3       Ab    Abacus
7        10      Bc    Because   

我希望最终的数据框是:

Jak  ID    Info
1    Ab    Abacus
2    Ab    Abacus
3    Ab    Abacus
4
5
6
7    Bc    Because
8    Bc    Because
9    Bc    Because
10   Bc    Because

根据开始和停止列匹配序列号的位置,如果序列号在该范围内,则添加来自 ID 和 Info 列的信息。

base 中,您可以在创建从 Start 到 Stop 的seq后使用merge

merge(x, do.call(rbind, Map(data.frame, Jak=mapply(seq, y$Start, y$Stop), ID=y$ID,
  Info=y$Info)), all.x=TRUE)
#   Jak   ID    Info
#1    1   Ab  Abacus
#2    2   Ab  Abacus
#3    3   Ab  Abacus
#4    4 <NA>    <NA>
#5    5 <NA>    <NA>
#6    6 <NA>    <NA>
#7    7   Bc Because
#8    8   Bc Because
#9    9   Bc Because
#10  10   Bc Because

数据:

x <- data.frame(Jak=1:10)
y <- read.table(header=TRUE, text="Start    Stop    ID    Info
1        3       Ab    Abacus
7        10      Bc    Because")

这是否有效:

library(dplyr)
library(tidyr)
library(purrr)
df2 %>% mutate(Jak = map2(Start, Stop, `:`)) %>% 
unnest(Jak) %>% select(3:5) %>% right_join(df1) %>% 
arrange(Jak) %>% select(3,1,2)
Joining, by = "Jak"
# A tibble: 10 x 3
     Jak ID    Info   
   <dbl> <chr> <chr>  
 1     1 Ab    Abacus 
 2     2 Ab    Abacus 
 3     3 Ab    Abacus 
 4     4 NA    NA     
 5     5 NA    NA     
 6     6 NA    NA     
 7     7 Bc    Because
 8     8 Bc    Because
 9     9 Bc    Because
10    10 Bc    Because

使用的数据:

df1
# A tibble: 10 x 1
     Jak
   <dbl>
 1     1
 2     2
 3     3
 4     4
 5     5
 6     6
 7     7
 8     8
 9     9
10    10
df2
# A tibble: 2 x 4
  Start  Stop ID    Info   
  <dbl> <dbl> <chr> <chr>  
1     1     3 Ab    Abacus 
2     7    10 Bc    Because

假设第二个数据帧中的Stop条件为 10,您可以使用fuzzyjoin

fuzzyjoin::fuzzy_left_join(df1, df2, by = c('Jak' = 'Start', 'Jak' = 'Stop'), 
                           match_fun = list(`>=`, `<=`))

#   Jak Start Stop   ID    Info
#1    1     1    3   Ab  Abacus
#2    2     1    3   Ab  Abacus
#3    3     1    3   Ab  Abacus
#4    4    NA   NA <NA>    <NA>
#5    5    NA   NA <NA>    <NA>
#6    6    NA   NA <NA>    <NA>
#7    7     7   10   Bc Because
#8    8     7   10   Bc Because
#9    9     7   10   Bc Because
#10  10     7   10   Bc Because

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM