简体   繁体   中英

How do I create 1s and 0s matrix from 2 vectors of strings?

I'm creating a matrix of 1s and 0s. It is 1 if a word is part of a string, 0 otherwise.

For example the expected matrix would be something as follow:

                           white hanging heart holder black suitcase
white hanging heart holder     1       1     1      1     0        0
black suitcase                 0       0     0      0     1        1

What I have at disposal are the 2 vectors:

Itemsvector = c("white hanging heart holder","black suitcase", ...)
Wordsvector = c("white","hanging","heart","holder","black", "suitcase",...)

I'm toying around the use of %in% operator

strsplit(Itemsvector[1], split = ' ')[[1]] %in% Wordsvector

Also

grepl(Wordsvector[1], Itemsvector)

Which does give me the TRUE and FALSE value, though I'm at lost to map this set of values to the whole matrix grid.

We can do this much easier with table after splitting the 'Itemsvector' into a list of vector s, stack it to a data.frame and use the table

table(stack(setNames(strsplit(Itemsvector, " "), Itemsvector))[2:1])
#                             values
#ind                          black hanging heart holder suitcase white
#  white hanging heart holder     0       1     1      1        0     1
#  black suitcase                 1       0     0      0        1     0

Or with mtabulate

library(qdapTools)
mtabulate(setNames(strsplit(Itemsvector, " "), Itemsvector))

You could try using double sapply and since you already have Wordsvector to search for no need to split Itemsvector again. We can find if a particular word is present or not in particular Itemsvector using grepl and for extra precaution we add word boundaries so that it doesn't match "white" with " whites" .

+(t(sapply(Itemsvector, function(x) sapply(Wordsvector, function(y) 
                                  grepl(paste0("\\b",y, "\\b"), x)))))

#                           white hanging heart holder black suitcase
#white hanging heart holder     1       1     1      1     0        0
#black suitcase                 0       0     0      0     1        1

data

Itemsvector = c("white hanging heart holder","black suitcase")
Wordsvector = c("white","hanging","heart","holder","black", "suitcase")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM