简体   繁体   中英

check if column contains part of another column in r

I have a dataframe with registration numbers in one column and correct registration number in another

a <- c("0c1234", "", "2468O")
b <- c("Oc1234", "Oc5678", "Oc9123")
df <- data.frame(a, b)

I wish to update row 1 as it was entered incorrectly, row 2 is blank so I would like to update the field. Row 3 has a different number, so I wish to keep this number, but make a new entry for this row (in another program, I just need to know that it needs to be inserted).

How do I produce this dataframe?

c <- c("update", "update", "insert")
df2 <- data.frame (a,b,c)

I have tried grepl and str_detect and also considered regex expressions with the grepl - ie check if the 4 number combination in column a is in column b but as yet have been unsuccessful

You can do something like this:

df$c <- ifelse(a == '', 'update', 'insert')

Your output will be as follows (desired df2 in your question):

       a      b      c
1 0c1234 Oc1234 insert
2        Oc5678 update
3  2468O Oc9123 insert

This will only work, of course, if your original data frame has 'transactions' in proper order.

You can do this in this way:

df <- data.frame(a,b,stringsAsFactors = F)

for (i in seq(1,nrow(df))){
    if (df$a[i] == '' || length(agrep(df$a[i],df$b[i])) > 0)
        df$c[i] <- 'update'
    else
        df$c[i] <- 'insert'
}

df

##       a      b      c
##1 0c1234 Oc1234 update
##2        Oc5678 update
##3  2468O Oc9123 insert

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM