I am using the 'lda' package in R to perform a topic model analysis of a corpus (let's call it 'corpusB'). I am preparing the corpus for the analysis by first using the command 'lexicalize', which returns a term-document matrix and, if not pre-specified, a vocabulary with unique tokens appearing in the corpus.
For research purposes, I want to lexicalize the corpus using a vocabulary inferred from another corpus (let's call it 'corpusA'), something that should be easily done. Yet, it is not working. Here is a sample of the code:
A <- lexicalize(corpusA) #the output of this command is just as expected
B <- lexicalize(corpusB, vocab = corpusA$vocab)
B$documents #let's see the term-document matrix
>>NULL #this is what I get
Any idea of why I am getting a null result? Strangely enough, the command works just fine if I am using simple character vectors rather than imported corpora.
A <- c("I have the very model of a modern major general")
B <- c("I have a major headache")
B1 <- lexicalize(B)
B1
$documents
$documents[[1]]
[,1] [,2] [,3] [,4] [,5]
[1,] 0 1 2 3 4
[2,] 1 1 1 1 1
$vocab
[1] "i" "have" "a" "major" "headache"
A1 <- lexicalize(A, vocab = B1$vocab)
A1
[[1]]
[,1] [,2] [,3] [,4]
[1,] 0 1 2 3
[2,] 1 1 1 1
A few more pieces of information that might be useful:
1) The corpus I am interested in (corpusB) contains 700mb of text, quite a considerable data;
2) Both corpora (B and A) are imported into R using the 'tm' package. Before the lexicalization, I use 'tm' also to remove punctuation, numbers, stopwords, to strip white spaces and lower case.
Any help is very much appreciated!
lexicalize()
expects a character vector of document lines to construct a corpus and vocabulary suitable for lda
. tm
corpus should be transformed to character vector before applying lexicalize()
texts <-data.frame(text=unlist(sapply(corpusA, `[`, "content")), stringsAsFactors=F)
l_corp <- lexicalize(texts$text)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.