简体   繁体   中英

Combine columns based on sequential range R

I have two dataframes that I want to combine.

Jak
1
2
3
4
5
6
7
8
9
10


Start    Stop    ID    Info
1        3       Ab    Abacus
7        10      Bc    Because   

I want the final dataframe to be:

Jak  ID    Info
1    Ab    Abacus
2    Ab    Abacus
3    Ab    Abacus
4
5
6
7    Bc    Because
8    Bc    Because
9    Bc    Because
10   Bc    Because

Where the sequential numbers are matched based on the start and stop columns, and if the sequential number is in that range, add information from the ID and Info columns.

In base you can use merge after you have created a seq from Start to Stop:

merge(x, do.call(rbind, Map(data.frame, Jak=mapply(seq, y$Start, y$Stop), ID=y$ID,
  Info=y$Info)), all.x=TRUE)
#   Jak   ID    Info
#1    1   Ab  Abacus
#2    2   Ab  Abacus
#3    3   Ab  Abacus
#4    4 <NA>    <NA>
#5    5 <NA>    <NA>
#6    6 <NA>    <NA>
#7    7   Bc Because
#8    8   Bc Because
#9    9   Bc Because
#10  10   Bc Because

Data:

x <- data.frame(Jak=1:10)
y <- read.table(header=TRUE, text="Start    Stop    ID    Info
1        3       Ab    Abacus
7        10      Bc    Because")

Does this work:

library(dplyr)
library(tidyr)
library(purrr)
df2 %>% mutate(Jak = map2(Start, Stop, `:`)) %>% 
unnest(Jak) %>% select(3:5) %>% right_join(df1) %>% 
arrange(Jak) %>% select(3,1,2)
Joining, by = "Jak"
# A tibble: 10 x 3
     Jak ID    Info   
   <dbl> <chr> <chr>  
 1     1 Ab    Abacus 
 2     2 Ab    Abacus 
 3     3 Ab    Abacus 
 4     4 NA    NA     
 5     5 NA    NA     
 6     6 NA    NA     
 7     7 Bc    Because
 8     8 Bc    Because
 9     9 Bc    Because
10    10 Bc    Because

Data used:

df1
# A tibble: 10 x 1
     Jak
   <dbl>
 1     1
 2     2
 3     3
 4     4
 5     5
 6     6
 7     7
 8     8
 9     9
10    10
df2
# A tibble: 2 x 4
  Start  Stop ID    Info   
  <dbl> <dbl> <chr> <chr>  
1     1     3 Ab    Abacus 
2     7    10 Bc    Because

Assuming the Stop condition in second dataframe is 10 you can use fuzzyjoin

fuzzyjoin::fuzzy_left_join(df1, df2, by = c('Jak' = 'Start', 'Jak' = 'Stop'), 
                           match_fun = list(`>=`, `<=`))

#   Jak Start Stop   ID    Info
#1    1     1    3   Ab  Abacus
#2    2     1    3   Ab  Abacus
#3    3     1    3   Ab  Abacus
#4    4    NA   NA <NA>    <NA>
#5    5    NA   NA <NA>    <NA>
#6    6    NA   NA <NA>    <NA>
#7    7     7   10   Bc Because
#8    8     7   10   Bc Because
#9    9     7   10   Bc Because
#10  10     7   10   Bc Because

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM