简体   繁体   中英

Incremently appending numpy.arrays to a save file

I've tried this method outlined by Hpaulji but it doesn't seem to working:

How to append many numpy files into one numpy file in python

Basically, I'm iterating through a generator, making some changes to an array, and then trying to save the each iteration's array.

Here is what my sample code looks like:

filename = 'testing.npy'

with open(filename, 'wb') as f:
    for x, _ in train_generator:
        prediction = base_model.predict(x)
        print(prediction[0,0,0,0:5])
        np.save(filename, prediction)

        current_iteration += 1
    if current_iteration == 5:
        break

Here, I'm going through 5 iterations, so I was hoping to save 5 different arrays.

I printed out a portion of each array, for debugging purposes:

[ 0.  0.  0.  0.  0.]
[ 0.          3.37349415  0.          0.          1.62561738]
[  0.          20.28489304   0.           0.           0.        ]
[ 0.  0.  0.  0.  0.]
[  0.          21.98013496   0.           0.           0.        ]

But when I tried to load the array, multiple times as noted here, How to append many numpy files into one numpy file in python , I'm getting an EOFERROR:

file = r'testing.npy'

with open(file,'rb') as f:
    arr = np.load(f)
    print(arr[0,0,0,0:5])
    arr = np.load(f)
    print(arr[0,0,0,0:5])

It's only outputting the last array and then an EOFERROR:

[  0.          21.98013496   0.           0.           0.        ]
EOFError: Ran out of input

print(arr[0,0,0,0:5])

I was expection all 5 arrays to be saved, but when I load the save .npy file multiple times, I only get the last array.

So, how should I be saving saving and appending new array to a file?

EDIT: Testing with '.npz' only saves last array

filename = 'testing.npz'

current_iteration = 0
with open(filename, 'wb') as f:
    for x, _ in train_generator:
        prediction = base_model.predict(x)
        print(prediction[0,0,0,0:5])
        np.savez(f, prediction)



        current_iteration += 1
        if current_iteration == 5:
            break


#loading

    file = 'testing.npz'

    with open(file,'rb') as f:
        arr = np.load(f)
        print(arr.keys())


>>>['arr_0']

All your calls to np.save use the filename, not the filehandle. Since you do not reuse the filehandle, each save overwrites the file instead of appending the array to it.

This should work:

filename = 'testing.npy'

with open(filename, 'wb') as f:
    for x, _ in train_generator:
        prediction = base_model.predict(x)
        print(prediction[0,0,0,0:5])
        np.save(f, prediction)

        current_iteration += 1
    if current_iteration == 5:
        break

And while there may be advantages to storing multiple arrays in one .npy file (I imagine advantages in situations where memory is limited), they are technically meant to store one single array, and you can use .npz files ( np.savez or np.savez_compressed ) to store multiple arrays:

filename = 'testing.npz'
predictions = []
for (x, _), index in zip(train_generator, range(5)):
    prediction = base_model.predict(x)
    predictions.append(prediction)
np.savez(filename, predictions) # will name it arr_0
# np.savez(filename, predictions=predictions) # would name it predictions
# np.savez(filename, *predictions) # would name it arr_0, arr_1, …, arr_4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM