如何创建二元矩阵？

Question

I want to make a matrix of the bigram model.我想制作一个二元模型的矩阵。 How can I do it?我该怎么做？ Any suggestions which match my code, please?任何与我的代码匹配的建议？

 import nltk
 from collections import Counter


 import codecs
 with codecs.open("Pezeshki339.txt",'r','utf8') as file:
     for line in file:
       token=line.split()

 spl = 80*len(token)/100
 train = token[:int(spl)]
 test = token[int(spl):]
 print(len(test))
 print(len(train))
 cn=Counter(train)
 known_words=([word for word,v in cn.items() if v>1])# removes the rare  words and puts them in a list

 bigram=nltk.bigrams(known_words)
 frequency=nltk.FreqDist(bigram)
 for f in frequency:
       print(f,frequency[f])

I need something like:我需要类似的东西：

          w1        w2      w3          ....wn
 w1     n(w1w1)  n(w1w2)  n(w1w3)      n(w1wn)
 w2     n(w2w1)  n(w2w1)  n(w2w3)      n(w2wn)
 w3   .
  .
  .
  .
  wn

The same for all rows and columns.所有行和列都相同。

Answer 1

Since you need a "matrix" of words, you'll use a dictionary-like class.由于您需要单词的“矩阵”，因此您将使用类似字典的类。 You want a dictionary of all first words in bigrams.您想要一本包含双字母组中所有第一个单词的字典。 To make a two-dimensional matrix, it will be a dictionary of dictionaries: Each value is another dictionary, whose keys are the second words of the bigrams and values are whatever you're tracking (probably number of occurrences).要制作二维矩阵，它将是一个字典字典：每个值都是另一个字典，其键是二元组的第二个单词，值是您要跟踪的任何内容（可能出现的次数）。

In the NLTK you can do it quickly with a ConditionalFreqDist() :在 NLTK 中，您可以使用ConditionalFreqDist()快速完成：

mybigrams = nltk.ConditionalFreqDist(nltk.bigrams(brown.words()))

But I recommend you build your bigram table step by step.但我建议你一步一步地建立你的二元表。 You'll understand it better, and you need to before you can use it.你会更好地理解它，你需要在使用它之前。

如何创建二元矩阵？

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-06-09 10:59:19

如何创建二元矩阵？

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-06-09 10:59:19

解决方案1
2 已采纳 2015-06-09 10:59:19