I'm creating a matrix of 1s and 0s. It is 1 if a word is part of a string, 0 otherwise.
For example the expected matrix would be something as follow:
white hanging heart holder black suitcase
white hanging heart holder 1 1 1 1 0 0
black suitcase 0 0 0 0 1 1
What I have at disposal are the 2 vectors:
Itemsvector = c("white hanging heart holder","black suitcase", ...)
Wordsvector = c("white","hanging","heart","holder","black", "suitcase",...)
I'm toying around the use of %in% operator
strsplit(Itemsvector[1], split = ' ')[[1]] %in% Wordsvector
Also
grepl(Wordsvector[1], Itemsvector)
Which does give me the TRUE and FALSE value, though I'm at lost to map this set of values to the whole matrix grid.
We can do this much easier with table
after splitting the 'Itemsvector' into a list
of vector
s, stack
it to a data.frame and use the table
table(stack(setNames(strsplit(Itemsvector, " "), Itemsvector))[2:1])
# values
#ind black hanging heart holder suitcase white
# white hanging heart holder 0 1 1 1 0 1
# black suitcase 1 0 0 0 1 0
Or with mtabulate
library(qdapTools)
mtabulate(setNames(strsplit(Itemsvector, " "), Itemsvector))
You could try using double sapply
and since you already have Wordsvector
to search for no need to split Itemsvector
again. We can find if a particular word is present or not in particular Itemsvector
using grepl
and for extra precaution we add word boundaries so that it doesn't match "white"
with " whites"
.
+(t(sapply(Itemsvector, function(x) sapply(Wordsvector, function(y)
grepl(paste0("\\b",y, "\\b"), x)))))
# white hanging heart holder black suitcase
#white hanging heart holder 1 1 1 1 0 0
#black suitcase 0 0 0 0 1 1
data
Itemsvector = c("white hanging heart holder","black suitcase")
Wordsvector = c("white","hanging","heart","holder","black", "suitcase")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.