Incremently appending numpy.arrays to a save file

Question

I've tried this method outlined by Hpaulji but it doesn't seem to working:

How to append many numpy files into one numpy file in python

Basically, I'm iterating through a generator, making some changes to an array, and then trying to save the each iteration's array.

Here is what my sample code looks like:

filename = 'testing.npy'

with open(filename, 'wb') as f:
    for x, _ in train_generator:
        prediction = base_model.predict(x)
        print(prediction[0,0,0,0:5])
        np.save(filename, prediction)

        current_iteration += 1
    if current_iteration == 5:
        break

Here, I'm going through 5 iterations, so I was hoping to save 5 different arrays.

I printed out a portion of each array, for debugging purposes:

[ 0.  0.  0.  0.  0.]
[ 0.          3.37349415  0.          0.          1.62561738]
[  0.          20.28489304   0.           0.           0.        ]
[ 0.  0.  0.  0.  0.]
[  0.          21.98013496   0.           0.           0.        ]

But when I tried to load the array, multiple times as noted here, How to append many numpy files into one numpy file in python , I'm getting an EOFERROR:

file = r'testing.npy'

with open(file,'rb') as f:
    arr = np.load(f)
    print(arr[0,0,0,0:5])
    arr = np.load(f)
    print(arr[0,0,0,0:5])

It's only outputting the last array and then an EOFERROR:

[  0.          21.98013496   0.           0.           0.        ]
EOFError: Ran out of input

print(arr[0,0,0,0:5])

I was expection all 5 arrays to be saved, but when I load the save .npy file multiple times, I only get the last array.

So, how should I be saving saving and appending new array to a file?

EDIT: Testing with '.npz' only saves last array

filename = 'testing.npz'

current_iteration = 0
with open(filename, 'wb') as f:
    for x, _ in train_generator:
        prediction = base_model.predict(x)
        print(prediction[0,0,0,0:5])
        np.savez(f, prediction)



        current_iteration += 1
        if current_iteration == 5:
            break


#loading

    file = 'testing.npz'

    with open(file,'rb') as f:
        arr = np.load(f)
        print(arr.keys())


>>>['arr_0']

Answer 1

All your calls to np.save use the filename, not the filehandle. Since you do not reuse the filehandle, each save overwrites the file instead of appending the array to it.

This should work:

filename = 'testing.npy'

with open(filename, 'wb') as f:
    for x, _ in train_generator:
        prediction = base_model.predict(x)
        print(prediction[0,0,0,0:5])
        np.save(f, prediction)

        current_iteration += 1
    if current_iteration == 5:
        break

And while there may be advantages to storing multiple arrays in one .npy file (I imagine advantages in situations where memory is limited), they are technically meant to store one single array, and you can use .npz files ( np.savez or np.savez_compressed ) to store multiple arrays:

filename = 'testing.npz'
predictions = []
for (x, _), index in zip(train_generator, range(5)):
    prediction = base_model.predict(x)
    predictions.append(prediction)
np.savez(filename, predictions) # will name it arr_0
# np.savez(filename, predictions=predictions) # would name it predictions
# np.savez(filename, *predictions) # would name it arr_0, arr_1, …, arr_4

Incremently appending numpy.arrays to a save file

Question

1 answers

solution1
3 ACCPTED 2018-02-04 00:38:01

Incremently appending numpy.arrays to a save file

Question

1 answers

solution1 3 ACCPTED 2018-02-04 00:38:01

solution1
3 ACCPTED 2018-02-04 00:38:01