[英]How to incorporate features from a latent semantic analysis as independent variables in a predictive model
I am trying to run logistic regression using text data in R. I have built a term document matrix and a corresponding latent semantic space. 我正在尝试使用R中的文本数据运行逻辑回归。我已经建立了术语文档矩阵和相应的潜在语义空间。 In my understanding, LSA is used in deriving 'concepts' out of 'terms' which could help in dimension reduction.
在我的理解中,LSA用于从“术语”中推导出“概念”,这可能有助于减小尺寸。 Here's my code:
这是我的代码:
tdm = TermDocumentMatrix(corpus, control = list(tokenize=myngramtoken,weighting=myweight))
tdm = removeSparseTerms(tdm,0.98)
tdm = as.matrix(tdm)
tdm.lsa = lsa(tdm,dimcalc_share())
tdm.lsa_tk=as.data.frame(tdm.lsa$tk)
tdm.lsa_dk=as.data.frame(tdm.lsa$dk)
tdm.lsa_sk=as.data.frame(tdm.lsa$sk)
This gives features as V1, V2, V3.... V21. 这提供了V1,V2,V3 ... V21等功能。 Is it possible to use these as the independent variables in my logistic regression?
在我的逻辑回归中是否可以将它们用作自变量? If so, how can I do it?
如果是这样,我该怎么办?
In the above example the table tdm.lsa_dk is a matrix of 'concepts' as columns and the documents where they appear as rows. 在上面的示例中,表tdm.lsa_dk是“概念”的矩阵,以列为单位,而文档以行的形式出现。 This can be used as the new training and testing data set for further analysis, in this case, logistic regression.
可以将其用作新的训练和测试数据集,以进行进一步分析(在这种情况下为逻辑回归)。 The independent variable (from the original dataset) is to be added to the new dataset.
自变量(来自原始数据集)将被添加到新数据集中。 The table tdm.lsa_sk can be used for variable selection.
表tdm.lsa_sk可用于变量选择。 It shows the 'concept' variables in decreasing order of importance.
它按重要性的降序显示“概念”变量。
# the $dk part of the lsa will behave as your new dataset
new.dataset <- tdm.lsa_dk
new.dataset$y.var <- original.dataset$y.var
# creating training and testing dataset out of the new dataset
test_index <- createDataPartition(new.dataset$y, p = .2, list = F)
Test<-new.dataset[test_index,]
Train<-new.dataset[-test_index,]
# create model
model<-glm(y.var~., data=Train, family="binomial")
prediction<-predict(model, Test, type="response")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.