如何使用tensorflow，contrib.learn向現有詞匯表添加單詞？

Question

我正在使用tensorflow詞匯表，導入如下：

from tensorflow.contrib import learn
vocabulary = learn.preprocessing.VocabularyProcessor(length)

我寫了一個單元測試，確保我可以保存詞匯，重新加載它，並在保持跟蹤舊句子的同時適應新句子。

這是我的結果：

The fit sentence:  [1 2 3 4 5 6 2 7 8 4 5 9 7]
The new fit sentence:  [0 0 0 2 9 0 6 2 7 8 4 0 0]

它工作正常，第一個句子中位置0（處理為2）的單詞與第二個句子中位置3中的單詞具有相同的值（2），因為它們是相同的。

但是，我注意到所有新單詞都是0。

我原本期望我的新句子看起來像這樣：

[10 11 12 2 9 10 6 2 7 8 4 12 11]

我該如何解決這個問題？ 如何讓詞匯處理器學習新單詞？

謝謝！

編輯1：

這是我的單元測試的精簡版：

import numpy as np
from tensorflow.contrib import learn

# A test sentence
test_sentence = "This is a test sentence. It is used to test. sentence, this, used"
test_sentence_len = len(test_sentence.split(" "))

# A vocabulary processor
vocabulary_processor = learn.preprocessing.VocabularyProcessor(test_sentence_len)

# Turning a list of sentences ( [test_sentence] ) into a list of fit test sentences and taking the first one.
fit_test_sentence = np.array(list(vocabulary_processor.fit_transform([test_sentence])))[0]

# We see that "is" ( position 1 ) and "is" ( position 6 ) are the same. They should have the same numeric value
# in the fit array as well
print("The fit sentence: ", fit_test_sentence)
# self.assertEqual(fit_test_sentence[1], fit_test_sentence[6])

initial_fit_sentence = fit_test_sentence

# Now, let's save

vocabulary_processor.save("some/path")

# Now, we load into a different variable

new_vocabulary_processor = learn.preprocessing.VocabularyProcessor.restore("some/path")

new_test_sentence = "Very different uttering is this one. It is used to test."

# Now, we fit the new sentence with the new vocabulary, which should be the old one
# We should see "is" being transformed into the same numerical value, initial_fit_sentence[1]

new_fit_sentence = np.array(list(new_vocabulary_processor.fit_transform([new_test_sentence])))[0]

print("The new fit sentence: ", new_fit_sentence)
# self.assertEqual(initial_fit_sentence[1], new_fit_sentence[3])

我嘗試改變test_sentence_len的值，可能是詞匯只是無法學習更多的新單詞，但即使我將它設置為1000，例如，它也不會學習新單詞。

Answer 1

看起來fit_transform方法會凍結詞匯量。 這意味着在此之前尚未觀察到的任何內容都將獲得0 ID（UNK）。 您可以使用new_vocabulary_processor.vocabulary_.freeze(False)解凍詞匯表。

new_vocabulary_processor = learn.preprocessing.VocabularyProcessor.restore("some/path")
new_vocabulary_processor.vocabulary_.freeze(False)
new_test_sentence = "Very different uttering is this one. It is used to test."

如何使用tensorflow，contrib.learn向現有詞匯表添加單詞？

問題描述

1 個解決方案

解決方案1
0 2016-12-14 16:46:59

如何使用tensorflow，contrib.learn向現有詞匯表添加單詞？

問題描述

1 個解決方案

解決方案1 0 2016-12-14 16:46:59

解決方案1
0 2016-12-14 16:46:59