[英]Combine columns based on sequential range R
我有两个要合并的数据框。
Jak
1
2
3
4
5
6
7
8
9
10
Start Stop ID Info
1 3 Ab Abacus
7 10 Bc Because
我希望最终的数据框是:
Jak ID Info
1 Ab Abacus
2 Ab Abacus
3 Ab Abacus
4
5
6
7 Bc Because
8 Bc Because
9 Bc Because
10 Bc Because
根据开始和停止列匹配序列号的位置,如果序列号在该范围内,则添加来自 ID 和 Info 列的信息。
在base 中,您可以在创建从 Start 到 Stop 的seq
后使用merge
:
merge(x, do.call(rbind, Map(data.frame, Jak=mapply(seq, y$Start, y$Stop), ID=y$ID,
Info=y$Info)), all.x=TRUE)
# Jak ID Info
#1 1 Ab Abacus
#2 2 Ab Abacus
#3 3 Ab Abacus
#4 4 <NA> <NA>
#5 5 <NA> <NA>
#6 6 <NA> <NA>
#7 7 Bc Because
#8 8 Bc Because
#9 9 Bc Because
#10 10 Bc Because
数据:
x <- data.frame(Jak=1:10)
y <- read.table(header=TRUE, text="Start Stop ID Info
1 3 Ab Abacus
7 10 Bc Because")
这是否有效:
library(dplyr)
library(tidyr)
library(purrr)
df2 %>% mutate(Jak = map2(Start, Stop, `:`)) %>%
unnest(Jak) %>% select(3:5) %>% right_join(df1) %>%
arrange(Jak) %>% select(3,1,2)
Joining, by = "Jak"
# A tibble: 10 x 3
Jak ID Info
<dbl> <chr> <chr>
1 1 Ab Abacus
2 2 Ab Abacus
3 3 Ab Abacus
4 4 NA NA
5 5 NA NA
6 6 NA NA
7 7 Bc Because
8 8 Bc Because
9 9 Bc Because
10 10 Bc Because
使用的数据:
df1
# A tibble: 10 x 1
Jak
<dbl>
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
df2
# A tibble: 2 x 4
Start Stop ID Info
<dbl> <dbl> <chr> <chr>
1 1 3 Ab Abacus
2 7 10 Bc Because
假设第二个数据帧中的Stop
条件为 10,您可以使用fuzzyjoin
fuzzyjoin::fuzzy_left_join(df1, df2, by = c('Jak' = 'Start', 'Jak' = 'Stop'),
match_fun = list(`>=`, `<=`))
# Jak Start Stop ID Info
#1 1 1 3 Ab Abacus
#2 2 1 3 Ab Abacus
#3 3 1 3 Ab Abacus
#4 4 NA NA <NA> <NA>
#5 5 NA NA <NA> <NA>
#6 6 NA NA <NA> <NA>
#7 7 7 10 Bc Because
#8 8 7 10 Bc Because
#9 9 7 10 Bc Because
#10 10 7 10 Bc Because
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.