简体   繁体   English

如何将ndarray写入python中的文本文件?

[英]How to write the ndarray to text file in python?

I am trying to use MNIST data for my research work.Now the dataset description is: 我正在尝试使用MNIST数据进行研究,现在数据集描述为:

The training_data is returned as a tuple with two entries. training_data作为具有两个条目的元组返回。 The first entry contains the actual training images. 第一个条目包含实际的训练图像。 This is a numpy ndarray with 50,000 entries. 这是一个具有50,000个条目的numpy ndarray。 Each entry is, in turn, a numpy ndarray with 784 values, representing the 28 * 28 = 784 pixels in a single MNIST image. 每个条目依次是一个具有784个值的numpy ndarray,代表单个MNIST图像中的28 * 28 = 784个像素。

 The second entry in the ``training_data`` tuple is a numpy ndarray containing 50,000 entries. Those entries are just the digit values (0...9) for the corresponding images contained in the first entry of the tuple. 

Now i am converting the training data like this: 现在我正在像这样转换训练数据:

In particular, training_data is a list containing 50,000 2-tuples (x, y) . 特别地, training_data是包含50,000个2元组(x, y) x is a 784-dimensional numpy.ndarray containing the input image. x是一个包含输入图像的784维numpy.ndarray。 y is a 10-dimensional numpy.ndarray representing the unit vector corresponding to the correct digit for x . y是一个10维numpy.ndarray,它表示与x的正确数字相对应的单位向量。 and the code for that is: 的代码是:

def load_data_nn():
    training_data, validation_data, test_data = load_data()
    #print training_data[0][1]
    #inputs = [np.reshape(x, (784, 1)) for x in training_data[0]]
    inputs = [np.reshape(x, (784,1)) for x in training_data[0]]
    print inputs[0]
    results = [vectorized_result(y) for y in training_data[1]]
    training_data = zip(inputs, results)
    test_inputs = [np.reshape(x, (784, 1)) for x in test_data[0]]
    return (training_data, test_inputs, test_data[1])

Now i want to write the inputs into a text file that means one row will be inputs[0] and another row will be inputs[1] and the data inside inputs[0] will be space separated and no ndarray brackets will present.For Example: 现在,我要将输入写入文本文件,这意味着一行将是inputs [0],另一行是inputs [1],inputs [0]内部的数据将以空格分隔,并且不存在任何ndarray括号。例:

 0 0.45 0.47 0,76

 0.78 0.34 0.35 0.56

Here one row in the text file is inputs[0].How to convert the ndarray to like above in textfile?? 这是文本文件中的一行输入[0]。如何将ndarray转换为上面的文本文件?

Since the answer to your question seems quite easy I guess your problem is speed. 由于您问题的答案似乎很简单,我想您的问题是速度。 Fortunately we can use multiprocessing here. 幸运的是,我们可以在这里使用多重处理。 Try this: 尝试这个:

from multiprocessing import Pool

def joinRow(row):
    return ' '.join(str(cell) for cell in row)

def inputsToFile(inputs, filepath):
    # in python3 you can do:
    # with Pool() as p:
    #     lines = p.map(joinRow, inputs, chunksize=1000)
    # instead of code from here...
    p = Pool()
    try:
        lines = p.map(joinRow, inputs, chunksize=1000)
    finally:
        p.close()
    # ...to here. But this works for both.

    with open(filepath,'w') as f:
        f.write('\n'.join(lines)) # joining already created strings goes fast

Still takes a while on my shitty laptop but is a lot faster than just '\\n'.join(' '.join(str(cell) for cell in row) for row in inputs) 在我肮脏的笔记本电脑上仍然需要一段时间,但比仅输入'\\n'.join(' '.join(str(cell) for cell in row) for row in inputs)快得多'\\n'.join(' '.join(str(cell) for cell in row) for row in inputs)

By the way, you can speed up the rest of your code as well: 顺便说一句,您也可以加快其余代码的速度:

def load_data_nn():
    training_data, validation_data, test_data = load_data()
    # suppose training_data[0].shape == (50000,28,28), otherwise leave it as is
    inputs = training_data[0].reshape((50000,784,1))
    print inputs[0]
    # create identity matrix and use entries of training_data[1] to
    # index corresponding unit vectors
    results = np.eye(10)[training_data[1]]
    training_data = zip(inputs, results)
    # suppose test_data[0].shape == (50000,28,28), otherwise leave it as is
    test_inputs = test_data[0].reshape((50000,784,1))
    return (training_data, test_inputs, test_data[1])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM