简体   繁体   中英

Match two dataframe by one column and subtract from matched rows in another column in R

I am trying to do the following:

  1. Split a R dataframe into control and rest (samples)
  2. Match a column of control and samples dataframe (word match)
  3. For each match in that column, subtract sample-control values in another column
  4. Print all matched and non-matched rows in output dataframe

I have tried:

#  input

My.Data <- structure(list(V1 = structure(1:9, .Label = c("a1", "a2", "a3", 
                                                         "a4", "a5", "control1", "control2", "control3", "control4"), class = "factor"), 
                          V2 = structure(c(1L, 1L, 2L, 3L, 5L, 1L, 2L, 3L, 4L), .Label = c("otu1", 
                                                                                           "otu2", "otu3", "otu4", "otu6"), class = "factor"), V3 = structure(c(4L, 
                                                                                                                                                                5L, 6L, 9L, 8L, 3L, 1L, 2L, 7L), .Label = c("ee", "tt", "w", 
                                                                                                                                                                                                            "xx", "xxx", "xy", "yy", "z44", "zz"), class = "factor"), 
                          V4 = c(44L, 52L, 11L, 22L, 91L, 4L, 34L, 33L, 11L)), class = "data.frame", row.names = c(NA, 
                                                                                                                   -9L))

# split groups

control<-My.Data[grep("^control*", My.Data$V1), ]
sample<-My.Data[!grepl("^control*",My.Data$V1),]

# match V2 in control and samples (example match: otu1 with otu1..)
?

# Whenever match found in V2 (multiple match is possible), subtract sample-control values in V4
?

# print all matched (and non-matched) rows in a dataframe

I want to get a output dataframe as:

a1  otu1    xx  40
a2  otu1    xxx 48
a3  otu2    xy  -23
a4  otu3    zz  -11
a5  otu6    z44 91
control4    otu4    yy  11

Thanks.

We can join the two datasets on V2 and subtract the column 'V4'

library(data.table)
setDT(sample)[control, V4 := V4 - i.V4, on = .(V2)]
sample
#   V1   V2  V3  V4
#1: a1 otu1  xx  40
#2: a2 otu1 xxx  48
#3: a3 otu2  xy -23
#4: a4 otu3  zz -11
#5: a5 otu6 z44  91

If we want to bind with the unmatched rows of 'control

rbind(sample, setDT(control)[!sample, on = .(V2)])
#         V1   V2  V3  V4
#1:       a1 otu1  xx  40
#2:       a2 otu1 xxx  48
#3:       a3 otu2  xy -23
#4:       a4 otu3  zz -11
#5:       a5 otu6 z44  91
#6: control4 otu4  yy  11

In tidyverse , we can use left_join and bind_rows with anti_join

library(dplyr)
left_join(sample, control %>% 
     select(V2, V4), by = 'V2') %>%
  transmute(V1, V2, V3, V4 = coalesce(V4.x-V4.y, V4.x)) %>% 
  bind_rows(anti_join(control, sample, by = 'V2'))

One dplyr option could be:

My.Data %>%
 group_by(V2) %>%
 filter(n() > 1) %>%
 mutate(V4 = V4 - V4[grepl("^control", V1)]) %>%
 filter(!grepl("^control", V1)) %>%
 bind_rows(My.Data %>%
            group_by(V2) %>%
            filter(n() == 1))

  V1       V2    V3       V4
  <fct>    <fct> <fct> <int>
1 a1       otu1  xx       40
2 a2       otu1  xxx      48
3 a3       otu2  xy      -23
4 a4       otu3  zz      -11
5 a5       otu6  z44      91
6 control4 otu4  yy       11

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM