简体   繁体   中英

Merge data in r based on conditions and creating a new column

I'm a beginner with R and I want to merge two datasets together, based on the ID column. If the ID number of dataset 2 exists in dataset 1, then I want the data for this particular row from dataset 2 to be added to the row in dataset 1.

Secondly, if there is a match, I want to put a "1" in a new column called match for that specific row/ match and "0" if there wasn't a match.

Example:

Dataset 1: 

Id     category 
123    3
124    1
125    2

Dataset 2: 

Id     score category 
123    0.24    3
124    0.83    1 
126    0.92    2  

Final example with the added column:

Id      score category  match
123    0.24    3         1 
124    0.83    1         1
125    NA      1         0
126    0.92    2         1

I've tried this so far (and some other combinations), but that did not give me good results:

data <- merge(df1, df2, by ="ID" , all.x = TRUE)

I would really appreciate any help!

Reproducible code:

df1 <- data.frame(ID=c("123","124","125"), category=c(3,1,2)
df2 <- data.frame(ID=c("123","124","126"), score=("0.24","0.83","0.92"), category=c("3","1","2")

You are almost there. I have slightly modified your example.

df1 <- data.frame(ID = c(123, 124, 125),
                  category=c(3, 1, 2))
df1

df2 <- data.frame(ID = c(123, 124, NA),
                  score = c(0.24, 0.83, 0.35),
                  category = c(3, 1, 2))
df2

df2$match <- 1
df2

data <- merge(df1, df2, by = c("ID", "category") , all.x = TRUE)
data$match[is.na(data$match)] <- 0
data

If you can provide appropriate result for the data you provided, I can update it if needed. However, you can flag based ID in df2 that matches in df1 by the following steps.

update: Looks like you only want to display the x value and not y value. Updated the output by calling all.x =T

data <- merge(df1, df2, by = c('ID', 'category') ,  all.x = T)
data$match <- ifelse(data$ID %in% df2$ID, 1, 0)

output

    ID category score match
1 123        3  0.24     1
2 124        1  0.83     1
3 125        2  <NA>     0

data

 df1 <- data.frame(ID=c("123","124","125"), category=c(3,1,2))
 df2 <- data.frame(ID=c("123","124","126"), score= c("0.24","0.83","0.92"),
               category=c(3,1,2))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM