[英]test and train CSV file python
如何簡單地測試和訓練 python pycharm 文件中的 CSV 文件中的數據
您從文件“ https://drive.google.com/file/d/1pvcuGk2nRTsYcd-l-_yNBzvvRj2qW5rF/view ”中檢查數據文件名是“論文數據.csv”
這是一個簡單的復制過去的代碼,如何將下面的文件替換為“Papers data.csv”
import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
import os
import gensim
test_data_dir = os.path.join(gensim.__path__[0], 'test', 'test_data')
lee_train_file = os.path.join(test_data_dir, 'lee_background.cor')
lee_test_file = os.path.join(test_data_dir, 'lee.cor')
import smart_open
def read_corpus(fname, tokens_only=False):
with smart_open.open(fname, encoding="iso-8859-1") as f:
for i, line in enumerate(f):
tokens = gensim.utils.simple_preprocess(line)
if tokens_only:
yield tokens
else:
yield gensim.models.doc2vec.TaggedDocument(tokens, [i])
train_corpus = list(read_corpus(lee_train_file))
test_corpus = list(read_corpus(lee_test_file, tokens_only=True))
# Let's take a look at the training corpus
print(train_corpus[:2])
# And the testing corpus looks like this:
print(test_corpus[:2])
model = gensim.models.doc2vec.Doc2Vec(vector_size=50, min_count=2, epochs=40)
model.build_vocab(train_corpus)
model.train(train_corpus, total_examples=model.corpus_count, epochs=model.epochs)
vector = model.infer_vector(['only', 'you', 'can', 'prevent', 'forest', 'fires'])
print(vector)
Tensorflow 提供庫來讀取 CSV 數據集,以在 model 上進行訓練和測試。 如tf.io.decode_csv
和tf.data.experimental.CsvDataset
。
Tensorflow 文檔中解釋了有關加載 CSV 數據的完整指南。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.