简体   繁体   中英

R : gregexpr across multiple columns and return single vector

I have multiple columns which contain strings of data.

(data$product, data$price, data$overview1, data$overview2, data$overview3, data$overview4)

I would like to create a new vector which only contains strings which begin with the string "Material:"

Setting the pattern for GREP

    matpattern <- "((?<=Material: ).*|(?<=Materials: ).*)"

Get strings which have material at start

    mat <- gregexpr(matpattern, data$Overview1, perl=TRUE)

Create vector to store string

     data$material1 <- regmatches(data$Overview1, mat, invert = FALSE)

/ repeat for overview2 /

    mat <- gregexpr(matpattern, data$Overview2, perl=TRUE)

    data$material2 <- regmatches(data$Overview2, mat, invert = FALSE)

The statement

    z <- cbind(material1, material2) 

gives a matrix when I want a list

Is there a method to get lapply & gregexpr to work across multiple columns and then place the new strings in a single column?

I have looked below, with no avail, thanks for your help.

Convert R vector to string vector of 1 element

Regular Expressions in R - compare one column to another

Using regexp to select rows in R dataframe

OK. This is aa complete hack, but I would like the final output to be a vector, rather than a list (ruling out apply, lapply?)

This gets the location and length of the required string across the 4 columns

m1 <- gregexpr(matpattern, data[ ,c("Overview1")], perl=TRUE)

m2 <- gregexpr(matpattern, data[ ,c("Overview2")], perl=TRUE)

m3 <- gregexpr(matpattern, data[ ,c("Overview3")], perl=TRUE)

m4 <- gregexpr(matpattern, data[ ,c("Overview4")], perl=TRUE)

This operation creates a set of vectors

mat1 <- regmatches(data[ ,c("Overview1")], m1, invert = FALSE)

mat2 <- regmatches(data[ ,c("Overview2")], m2, invert = FALSE)

mat3 <- regmatches(data[ ,c("Overview3")], m3, invert = FALSE)

mat4 <- regmatches(data[ ,c("Overview4")], m4, invert = FALSE)

Then I paste all the vectors into one big one (future operations will ignore 'character(0)')

data$Material <-paste(mat1,mat2,mat3,mat4)

I can then use this vector to calculate the mean of data$price based on occurrence of certain text strings in data$Material

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM