I have multiple columns which contain strings of data.
(data$product, data$price, data$overview1, data$overview2, data$overview3, data$overview4)
I would like to create a new vector which only contains strings which begin with the string "Material:"
Setting the pattern for GREP
matpattern <- "((?<=Material: ).*|(?<=Materials: ).*)"
Get strings which have material at start
mat <- gregexpr(matpattern, data$Overview1, perl=TRUE)
Create vector to store string
data$material1 <- regmatches(data$Overview1, mat, invert = FALSE)
/ repeat for overview2 /
mat <- gregexpr(matpattern, data$Overview2, perl=TRUE)
data$material2 <- regmatches(data$Overview2, mat, invert = FALSE)
The statement
z <- cbind(material1, material2)
gives a matrix when I want a list
Is there a method to get lapply & gregexpr to work across multiple columns and then place the new strings in a single column?
I have looked below, with no avail, thanks for your help.
Convert R vector to string vector of 1 element
OK. This is aa complete hack, but I would like the final output to be a vector, rather than a list (ruling out apply, lapply?)
This gets the location and length of the required string across the 4 columns
m1 <- gregexpr(matpattern, data[ ,c("Overview1")], perl=TRUE)
m2 <- gregexpr(matpattern, data[ ,c("Overview2")], perl=TRUE)
m3 <- gregexpr(matpattern, data[ ,c("Overview3")], perl=TRUE)
m4 <- gregexpr(matpattern, data[ ,c("Overview4")], perl=TRUE)
This operation creates a set of vectors
mat1 <- regmatches(data[ ,c("Overview1")], m1, invert = FALSE)
mat2 <- regmatches(data[ ,c("Overview2")], m2, invert = FALSE)
mat3 <- regmatches(data[ ,c("Overview3")], m3, invert = FALSE)
mat4 <- regmatches(data[ ,c("Overview4")], m4, invert = FALSE)
Then I paste all the vectors into one big one (future operations will ignore 'character(0)')
data$Material <-paste(mat1,mat2,mat3,mat4)
I can then use this vector to calculate the mean of data$price based on occurrence of certain text strings in data$Material
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.