簡體   English   中英

測試和訓練 CSV 文件 python

[英]test and train CSV file python

如何簡單地測試和訓練 python pycharm 文件中的 CSV 文件中的數據

您從文件“ https://drive.google.com/file/d/1pvcuGk2nRTsYcd-l-_yNBzvvRj2qW5rF/view ”中檢查數據文件名是“論文數據.csv”

這是一個簡單的復制過去的代碼,如何將下面的文件替換為“Papers data.csv”

import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

import os

import gensim

test_data_dir = os.path.join(gensim.__path__[0], 'test', 'test_data')
lee_train_file = os.path.join(test_data_dir, 'lee_background.cor')

lee_test_file = os.path.join(test_data_dir, 'lee.cor')


import smart_open
def read_corpus(fname, tokens_only=False):
    with smart_open.open(fname, encoding="iso-8859-1") as f:
        for i, line in enumerate(f):
            tokens = gensim.utils.simple_preprocess(line)
            if tokens_only:
                yield tokens
            else:
                yield gensim.models.doc2vec.TaggedDocument(tokens, [i])

train_corpus = list(read_corpus(lee_train_file))
test_corpus = list(read_corpus(lee_test_file, tokens_only=True))

# Let's take a look at the training corpus
print(train_corpus[:2])

# And the testing corpus looks like this:
print(test_corpus[:2])

model = gensim.models.doc2vec.Doc2Vec(vector_size=50, min_count=2, epochs=40)
model.build_vocab(train_corpus)

model.train(train_corpus, total_examples=model.corpus_count, epochs=model.epochs)
vector = model.infer_vector(['only', 'you', 'can', 'prevent', 'forest', 'fires'])
print(vector)

Tensorflow 提供庫來讀取 CSV 數據集,以在 model 上進行訓練和測試。 tf.io.decode_csvtf.data.experimental.CsvDataset

Tensorflow 文檔中解釋了有關加載 CSV 數據的完整指南。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM