[英]how to use my own file instead of using dataset in this code
我正在實現這段代碼,這給了我正確的輸出,但我想將這四行“數據集”保存在一個文件中,然后使用它。我該怎么做?我怎么能使用我自己的文件而不是手動輸入的數據集?
from naiveBayesClassifier import tokenizer
from naiveBayesClassifier.trainer import Trainer
from naiveBayesClassifier.classifier import Classifier
nTrainer = Trainer(tokenizer)
dataSet =[
{'text': 'hello everyone', 'category': 'NO'},
{'text': 'dont use words like jerk', 'category': 'YES'},
{'text': 'what the hell.', 'category': 'NO'},
{'text': 'you jerk','category': 'yes'},
]
for n in dataSet:
nTrainer.train(n['text'], n['category'])
nClassifier = Classifier(nTrainer.data, tokenizer)
.
unknownInstance = "Even if I eat too much, is not it possible to lose some weight"
classification = nClassifier.classify(unknownInstance)
print classification
您可以將數據集存儲為 json 文件,然后將其加載到您的 Python 代碼中:
import json
with open('data.json') as f:
dataSet = json.loads(f.read())
# Use dataset.
這條線似乎是做最多訓練的工作。
nTrainer.train(n['text'], n['category'])
這條線似乎是在學習后做預測。
classification = nClassifier.classify(unknownInstance)
因此,如果您有一個語料庫列表(訓練數據)、相應標簽列表和要預測的數據列表(未知實例)
你可以這樣
from naiveBayesClassifier import tokenizer
from naiveBayesClassifier.trainer import Trainer
from naiveBayesClassifier.classifier import Classifier
corpus = ['hello everyone', 'dont use words like jerk', 'what the hell.', 'you jerk'] # Your training data
labels = ['NO', 'YES', 'NO', 'YES'] # Your labels
unknown_data = ['Even if I eat too much, is not it possible to lose some weight'] # List of data to be predicted
nTrainer = Trainer(tokenizer)
# model training
for item, category in zip(corpus, labels):
nTrainer.train(item, category)
nClassifier = Classifier(nTrainer.data, tokenizer)
predictions = [ nClassifier.classify(unknownInstance) for unknownInstance in unknown_data]
print classification
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.