简体   繁体   中英

Using the lsa package in R - Error in Ops.simple_triplet_matrix(m, 1) : Incompatible dimensions

I am trying to learn to use the lsa package in R. I am working with a much larger data set than the example below, but this is for the purposes of reproducibility (props to this person for posting this code on his site, it's a great resource).

I get an odd error message which I can't seem to resolve:

Error in Ops.simple_triplet_matrix(m, 1) : Incompatible dimensions. 

below is some of the code I'm tinkering with:

# load required libraries
library(tm)
library(ggplot2)
library(lsa)
library(SnowballC)
lsa <- function () {
# 1. Prepare mock data
text <- c("transporting food by cars will cause global warming. so we should go     local.",
          "we should try to convince our parents to stop using cars because it     will cause global warming.",
          "some food, such as mongo, requires a warm weather to grow. so they     have to be transported to canada.",
          "a typical Electronic Circuit can be built with a battery, a bulb, and     a switch.",
          "electricity flows from batteries to the bulb, just like water flows     through a tube.",
          "batteries have chemical energe in it. then electrons flow through a     bulb to light it up.",
          "birds can fly because they have feather and they are light.", "why     some birds like pigeon can fly while some others like chicken cannot?",
          "feather is important for birds' fly. if feather on a bird's wings is     removed, this bird cannot fly.")
view <- factor(rep(c("view 1", "view 2", "view 3"), each = 3))
df <- data.frame(text, view, stringsAsFactors = FALSE)

# prepare corpus
corpus <- Corpus(VectorSource(df$text))
# corpus <- tm_map(corpus, tolower)
# corpus <- tm_map(corpus, removePunctuation)
# corpus <- tm_map(corpus, function(x) removeWords(x, stopwords("english")))
# corpus <- tm_map(corpus, stemDocument, language = "english")
corpus <- tm_map(corpus, PlainTextDocument)

# 2. MDS with raw term-document matrix compute distance matrix
td.mat <- TermDocumentMatrix(corpus)

td.mat.lsa <- lw_logtf(td.mat) * gw_idf(td.mat)  # weighting
lsaSpace <- lsa(td.mat.lsa)  # create LSA space
dist.mat.lsa <- dist(t(as.textmatrix(lsaSpace)))  # compute distance matrix
return(dist.mat.lsa)  # check distance matrix

}

I can generate the corpus with no problem, and I can convert it to a term document matrix. The error is triggered when I define dt.mat.lsa.

The traceback is as follows:

4 stop("Incompatible dimensions.") 
3 Ops.simple_triplet_matrix(m, 1) 
2 lw_logtf(td.mat) at lsa.R#31
1 lsa() 

My primary questions are therefore:

  1. why do I get this error?
  2. how can I fix my code to avoid such an error?

Thanks in advance for any help you can offer here; this is my first post, so feedback on the quality of my question is also welcome!

It has been figured out!

I had wrapped my code in the 'lsa' function call and was using 'lsa' as a variable name in the body of the function. Thus it has incompatible dimensions because lsa is a function that is differently defined in this environment.

phew!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM