簡體   English   中英

theano csv到pkl文件

[英]theano csv to pkl file

我正在嘗試使一個pkl文件從csv起點加載到theano中

import numpy as np
import csv
import gzip, cPickle
from numpy import genfromtxt
import theano
import theano.tensor as T

#Open csv file and read in data
csvFile = "filename.csv"
my_data = genfromtxt(csvFile, delimiter=',', skip_header=1)
data_shape = "There are " + repr(my_data.shape[0]) + " samples of vector length " + repr(my_data.shape[1])

num_rows = my_data.shape[0] # Number of data samples
num_cols = my_data.shape[1] # Length of Data Vector

total_size = (num_cols-1) * num_rows


data = np.arange(total_size)
data = data.reshape(num_rows, num_cols-1) # 2D Matrix of data points
data = data.astype('float32')

label = np.arange(num_rows)
print label.shape
#label = label.reshape(num_rows, 1) # 2D Matrix of data points
label = label.astype('float32')

print data.shape

#Read through data file, assume label is in last col
for i in range(my_data.shape[0]):
    label[i] = my_data[i][num_cols-1]

    for j in range(num_cols-1):
        data[i][j] = my_data[i][j]


#Split data in terms of 70% train, 10% val, 20% test

train_num = int(num_rows * 0.7)
val_num = int(num_rows * 0.1)
test_num = int(num_rows * 0.2)

DataSetState = "This dataset has " + repr(data.shape[0]) + " samples of length " + repr(data.shape[1]) + ". The number of training examples is " + repr(train_num)
print DataSetState



train_set_x = data[:train_num]
train_set_y = label[:train_num]

val_set_x = data[train_num+1:train_num+val_num]
val_set_y = label[train_num+1:train_num+val_num]

test_set_x = data[train_num+val_num+1:]
test_set_y = label[train_num+val_num+1:]


# Divided dataset into 3 parts. split by percentage.

train_set = train_set_x, train_set_y
val_set = val_set_x, val_set_y
test_set = test_set_x, val_set_y


dataset = [train_set, val_set, test_set]

f = gzip.open(csvFile+'.pkl.gz','wb')
cPickle.dump(dataset, f, protocol=2)
f.close()

當我通過Thenao(作為DBN或SdA)運行生成的pkl文件時,它會進行預訓練,這使我認為數據已正確存儲。

但是,當涉及到微調時,會出現以下錯誤:

epoch 1, minibatch 2775/2775, validation error 0.000000 %

    Traceback (most recent call last):
      File "SdA_custom.py", line 489, in 
        test_SdA()
      File "SdA_custom.py", line 463, in test_SdA
        test_losses = test_model()
      File "SdA_custom.py", line 321, in test_score
        return [test_score_i(i) for i in xrange(n_test_batches)]

      File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 606, in __call__
        storage_map=self.fn.storage_map)
      File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 595, in __call__
        outputs = self.fn()
    ValueError: Input dimension mis-match. (input[0].shape[0] = 10, input[1].shape[0] = 3)
    Apply node that caused the error: Elemwise{neq,no_inplace}(argmax, Subtensor{int64:int64:}.0)
    Inputs types: [TensorType(int64, vector), TensorType(int32, vector)]
    Inputs shapes: [(10,), (3,)]
    Inputs strides: [(8,), (4,)]
    Inputs values: ['not shown', array([0, 0, 0], dtype=int32)]

    Backtrace when the node is created:
      File "/home/dean/Documents/DeepLearningRepo/DeepLearningTutorials-master/code/logistic_sgd.py", line 164, in errors
        return T.mean(T.neq(self.y_pred, y))

    HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

10是我的批處理大小,如果我將批處理大小更改為1,則會得到以下信息:

ValueError: Input dimension mis-match. (input[0].shape[0] = 1, input[1].shape[0] = 0)

我認為我在制作pkl時將標簽存儲錯了,但是我似乎無法發現正在發生的事情或為什么更改批次會改變錯誤

希望能對您有所幫助!

剛才看到了這個,就像在尋找我得到的類似錯誤一樣。 發布回復,這樣可以幫助尋找類似錯誤的人。 對我來說,當我將dbn_test()參數列表中的n_out從1更改為2時,錯誤解決了。 n_out是標簽數而不是輸出層數。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM