如何将潜在语义分析中的特征作为独立变量合并到预测模型中

Question

I am trying to run logistic regression using text data in R. I have built a term document matrix and a corresponding latent semantic space. 我正在尝试使用R中的文本数据运行逻辑回归。我已经建立了术语文档矩阵和相应的潜在语义空间。 In my understanding, LSA is used in deriving 'concepts' out of 'terms' which could help in dimension reduction. 在我的理解中，LSA用于从“术语”中推导出“概念”，这可能有助于减小尺寸。 Here's my code: 这是我的代码：

tdm = TermDocumentMatrix(corpus, control = list(tokenize=myngramtoken,weighting=myweight))
tdm = removeSparseTerms(tdm,0.98)
tdm = as.matrix(tdm)
tdm.lsa = lsa(tdm,dimcalc_share())
tdm.lsa_tk=as.data.frame(tdm.lsa$tk)
tdm.lsa_dk=as.data.frame(tdm.lsa$dk)
tdm.lsa_sk=as.data.frame(tdm.lsa$sk)

This gives features as V1, V2, V3.... V21. 这提供了V1，V2，V3 ... V21等功能。 Is it possible to use these as the independent variables in my logistic regression? 在我的逻辑回归中是否可以将它们用作自变量？ If so, how can I do it? 如果是这样，我该怎么办？

Answer 1

In the above example the table tdm.lsa_dk is a matrix of 'concepts' as columns and the documents where they appear as rows. 在上面的示例中，表tdm.lsa_dk是“概念”的矩阵，以列为单位，而文档以行的形式出现。 This can be used as the new training and testing data set for further analysis, in this case, logistic regression. 可以将其用作新的训练和测试数据集，以进行进一步分析（在这种情况下为逻辑回归）。 The independent variable (from the original dataset) is to be added to the new dataset. 自变量（来自原始数据集）将被添加到新数据集中。 The table tdm.lsa_sk can be used for variable selection. 表tdm.lsa_sk可用于变量选择。 It shows the 'concept' variables in decreasing order of importance. 它按重要性的降序显示“概念”变量。

     # the $dk part of the lsa will behave as your new dataset 

    new.dataset <- tdm.lsa_dk 
    new.dataset$y.var <- original.dataset$y.var

     # creating training and testing dataset out of the new dataset

    test_index <- createDataPartition(new.dataset$y, p = .2, list = F)
    Test<-new.dataset[test_index,]
    Train<-new.dataset[-test_index,]

     # create model

    model<-glm(y.var~., data=Train, family="binomial")
    prediction<-predict(model, Test, type="response")

如何将潜在语义分析中的特征作为独立变量合并到预测模型中

问题描述

1 个解决方案

解决方案1
0 2017-07-06 11:34:59

如何将潜在语义分析中的特征作为独立变量合并到预测模型中

问题描述

1 个解决方案

解决方案1 0 2017-07-06 11:34:59

解决方案1
0 2017-07-06 11:34:59