如何在此代碼中使用我自己的文件而不是使用數據集

Question

我正在實現這段代碼，這給了我正確的輸出，但我想將這四行“數據集”保存在一個文件中，然后使用它。我該怎么做？我怎么能使用我自己的文件而不是手動輸入的數據集?

from naiveBayesClassifier import tokenizer

from naiveBayesClassifier.trainer import Trainer

from naiveBayesClassifier.classifier import Classifier

nTrainer = Trainer(tokenizer)


dataSet =[
    {'text': 'hello everyone', 'category': 'NO'},

    {'text': 'dont use words like jerk', 'category': 'YES'},

    {'text': 'what the hell.', 'category': 'NO'},

    {'text': 'you jerk','category': 'yes'},


]

for n in dataSet:

    nTrainer.train(n['text'], n['category'])

nClassifier = Classifier(nTrainer.data, tokenizer)
.
unknownInstance = "Even if I eat too much, is not it possible to lose some weight"

classification = nClassifier.classify(unknownInstance)

print classification

Answer 1

您可以將數據集存儲為 json 文件，然后將其加載到您的 Python 代碼中：

import json


with open('data.json') as f:
    dataSet = json.loads(f.read())

    # Use dataset.

Answer 2

這條線似乎是做最多訓練的工作。

nTrainer.train(n['text'], n['category'])

這條線似乎是在學習后做預測。

classification = nClassifier.classify(unknownInstance)

因此，如果您有一個語料庫列表（訓練數據）、相應標簽列表和要預測的數據列表（未知實例）
你可以這樣

from naiveBayesClassifier import tokenizer
from naiveBayesClassifier.trainer import Trainer
from naiveBayesClassifier.classifier import Classifier

corpus = ['hello everyone', 'dont use words like jerk', 'what the hell.', 'you jerk'] # Your training data
labels = ['NO', 'YES', 'NO', 'YES'] # Your labels
unknown_data = ['Even if I eat too much, is not it possible to lose some weight'] # List of data to be predicted

nTrainer = Trainer(tokenizer)

# model training
for item, category in zip(corpus, labels):
    nTrainer.train(item, category)

nClassifier = Classifier(nTrainer.data, tokenizer)
predictions = [ nClassifier.classify(unknownInstance)  for unknownInstance in unknown_data]

print classification

如何在此代碼中使用我自己的文件而不是使用數據集

問題描述

2 個解決方案

解決方案1
1 2015-11-07 10:32:05

解決方案2
0 2015-11-07 14:28:11

如何在此代碼中使用我自己的文件而不是使用數據集

問題描述

2 個解決方案

解決方案1 1 2015-11-07 10:32:05

解決方案2 0 2015-11-07 14:28:11

解決方案1
1 2015-11-07 10:32:05

解決方案2
0 2015-11-07 14:28:11