[英]Why are the signs of my topic weights changing from run to run?
I'm running the LSI program from Gensim's Topics and Transformations tutorial and for some reason, the signs of the topic weights keep switching from positive to negative and vice versa.我正在运行 Gensim 的主题和转换教程中的 LSI 程序,出于某种原因,主题权重的符号不断从正变为负,反之亦然。 For example, this is what I get when I print using the line
例如,这就是我使用该行打印时得到的结果
for doc, as_text in zip(corpus_lsi, documents):
print(doc, as_text)
Run 1
[(0, 0.066007833960900791), (1, 0.52007033063618491), (2, -0.37649581219168904)]
[(0, 0.196675928591421), (1, 0.7609563167700063), (2, 0.5080674581001664)]
[(0, 0.089926399724459982), (1, 0.72418606267525132), (2, -0.408989731553764)]
[(0, 0.075858476521777865), (1, 0.63205515860034334), (2, -0.53935336057339001)]
[(0, 0.10150299184979866), (1, 0.57373084830029653), (2, 0.67093385852959075)]
[(0, 0.70321089393783254), (1, -0.1611518021402539), (2, -0.18266089635241448)]
[(0, 0.87747876731198449), (1, -0.16758906864658912), (2, -0.10880822642632856)]
[(0, 0.90986246868185872), (1, -0.14086553628718496), (2, 0.00087117874886860625)]
[(0, 0.61658253505692762), (1, 0.053929075663897361), (2, 0.25568697959599318)]
Run 2
[(0, 0.066007833960908563), (1, -0.52007033063618446), (2, -0.37649581219168959)]
[(0, 0.19667592859143226), (1, -0.76095631677000253), (2, 0.50806745810016629)]
[(0, 0.089926399724470751), (1, -0.72418606267525032), (2, -0.40898973155376284)]
[(0, 0.075858476521787177), (1, -0.63205515860034223), (2, -0.5393533605733889)]
[(0, 0.10150299184980684), (1, -0.57373084830029419), (2, 0.67093385852959098)]
[(0, 0.70321089393782976), (1, 0.16115180214026417), (2, -0.18266089635241456)]
[(0, 0.87747876731198149), (1, 0.16758906864660211), (2, -0.10880822642632891)]
[(0, 0.90986246868185627), (1, 0.14086553628719861), (2, 0.00087117874886795399)]
[(0, 0.61658253505692828), (1, -0.053929075663887563), (2, 0.25568697959599251)]
Run 3
[(0, 0.066007833960902929), (1, -0.52007033063618535), (2, 0.37649581219168821)]
[(0, 0.19667592859142491), (1, -0.76095631677000497), (2, -0.50806745810016662)]
[(0, 0.089926399724463771), (1, -0.7241860626752511), (2, 0.40898973155376317)]
[(0, 0.075858476521781085), (1, -0.63205515860034334), (2, 0.5393533605733889)]
[(0, 0.10150299184980124), (1, -0.57373084830029542), (2, -0.67093385852959064)]
[(0, 0.70321089393783143), (1, 0.16115180214025732), (2, 0.18266089635241564)]
[(0, 0.87747876731198304), (1, 0.16758906864659326), (2, 0.10880822642632952)]
[(0, 0.90986246868185761), (1, 0.1408655362871892), (2, -0.00087117874886778746)]
[(0, 0.61658253505692784), (1, -0.053929075663894419), (2, -0.25568697959599318)]
I am running Python 3.5.2 on a PC, coding in IntelliJ.我在 PC 上运行 Python 3.5.2,在 IntelliJ 中编码。
Anyone encountered this problem, using the Gensim library or elsewhere?任何人都遇到过这个问题,使用 Gensim 库或其他地方?
LSI model is nothing but an implementation of fast truncated SVD underneath it. LSI 模型只不过是它下面的快速截断 SVD 的实现。 SVD calculates eigen vectors and these vectors correspond to the topics.
SVD 计算特征向量,这些向量对应于主题。 However, eigenvectors remain eigenvectors even after multiplying by -1.
然而,即使在乘以 -1 之后,特征向量仍然是特征向量。 So the sign might keep flipping based on the how the algorithm is implemented.
因此,符号可能会根据算法的实现方式不断翻转。 In fact it is the case with the SVD implementation of the popular library LAPACK and even the numpy implementation.
事实上,流行库 LAPACK 的 SVD 实现甚至 numpy 实现就是这种情况。
The sign really does not matter here, as multiplication by -1 is also an eigen vector.符号在这里真的无关紧要,因为乘以 -1 也是一个特征向量。
There is a number of possibilities:有多种可能性:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.