I'm working with a few sets of data in a company where certain groups write their project codes slightly differently than others.
For example, Group A uses a 5-Character project code C106A , while Group B uses a long code such as HD-01 C106A 00. Note the matching characters.
What I'm trying to do is group all this project data together, and a crucial step in the data clean up is fixing these codes so that I can group by them. I have a library of all the long codes because Group B has more regulations placed on them, so I can kind of count on their data.
I would like to have my code perform a search on Group B's library, using the string in Group A's data, and when it find's a matching set from Group B, replace the Value from Group A's data. I've been playing with Stringr's str_detect commands, I can't seem to get it to work.
Portion of Group A's Project Codes:
Project.Code
C106A
C117A
C254A
C342A
C365A
C371A
C391A
C397A
C397B
C397C
C399A
C400A
C404A
C405A
C414A
C417A
Portion of Group B's Library:
Project.Code
HP-C3651001
HP-C3651003
HP-C3651009
HP-C3651P00
HP-C365A000
HP-C365B000
HP-C3421001
HP-C3421002
HP-C3421003
HP-C3421P00
HP-C342A000
HP-C1061001
HP-C1061011
HP-C1061013
HP-C1061016
HP-C1061P00
HP-C106A000
Something like this makes sense and works:
str_detect(GroupA$Project.Code,"C365A")
but I can't seem to do this:
str_detect(GroupA$Project.Code,GroupB$Project.Code)
One option is to paste
into single string with |
signifying the OR
library(stringr)
str_detect(GroupB$Project.Code, str_c(GroupA$Project.Code, collapse ="|"))
Not sure if the following is what you want
idx <- Filter(length,sapply(grpA, function(x) grep(x,grpB)))
df <- data.frame(grpA_idx = match(names(idx),grpA),grpA_code =names(idx), grpB_code=grpB[unlist(idx)])
which gives:
> df
grpA_idx grpA_code grpB_code
1 1 C106A HP-C106A000
2 4 C342A HP-C342A000
3 5 C365A HP-C365A000
DATA
grpA <- c("C106A", "C117A", "C254A", "C342A",
"C365A", "C371A", "C391A", "C397A", "C397B", "C397C", "C399A",
"C400A", "C404A", "C405A", "C414A", "C417A")
grpB <- c("HP-C1061001", "HP-C1061011", "HP-C1061013", "HP-C1061016",
"HP-C1061P00", "HP-C106A000", "HP-C3421001", "HP-C3421002", "HP-C3421003",
"HP-C3421P00", "HP-C342A000", "HP-C3651001", "HP-C3651003", "HP-C3651009",
"HP-C3651P00", "HP-C365A000", "HP-C365B000")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.