简体   繁体   中英

Numpy change elements based on threshold and then do element by element addition

I have 3247 197x10 dimension matrices. I need to scan through them, and if a value is above 1, set it equal to 1. If a value is less than or equal to 1, I want to set it to zero. Then, I have to take this modified matrix and add it to the modified matrices of the other 3246 sets. Here is what I have so far:

for i in range(LOWER, UPPER + 1):
    fname = file_name+str(i)+".txt"
    cur_resfile = np.genfromtxt(fname, delimiter = ",", skiprows = 1)
    m_cur = cur_resfile

    m_cur[m_cur <= 1] = 0
    m_cur[m_cur > 1 ] = 1

    m_ongoing = m_ongoing + m_cur

I want m_ongoing to hold the ongoing running sums so that I can save this to a file. However, it's not working and seems to just be writing the last m_cur in the loop. If I run the loop a total of 3 times, there are some cells that all mutually have 1s, so I would expect a few threes. I definitely expect a lot of 2s but I'm only seeing 1s and 0s.

What is the best way to do what I'm trying to do?

-Change values based on condition

-Take a lot of matrices and add all element by element to create running sums for each cell.

You could use numpy.clip()

for i in range(LOWER, UPPER + 1):
    fname = file_name+str(i)+".txt"

    cur_resfile = np.genfromtxt(fname, delimiter = ",", skiprows = 1)

    m_ongoing += cur_resfile.clip(0,1)

EDIT Answering the question that was asked:

m_ongoing = np.zeros((197,10))

for i in range(LOWER, UPPER + 1):
    fname = file_name+str(i)+".txt"
    cur_resfile = np.genfromtxt(fname, delimiter = ",", skiprows = 1)

    # add one to the places where cur_file > 1
    m_ongoing[cur_resfile > 1] += 1

As @RootTwo suggests, clip() is a nice numpy built-in. But for performance reasons, you can use vectorized operations on a 3D "stack" of your data.

Example:

import numpy as np
#simulating your data as a list of 3247 2D matrices, each 197x10
some_data = [np.random.randint(-2,2,(197,10)) for _i in range(3247)]
#stack the matrices
X = np.dstack(some_data)
print(X.shape)

(197, 10, 3247)

Y = X.clip(0,1)
Z = Y.sum(axis=2)
#Z is now the output you want!
print(Z.shape)

(197, 10)

EDIT: Adding Timing Results, and Changing my Answer

So it appears my advice to create a depth stack and use a single application of the clip and sum functions was ill-advised. I ran some timing tests and discovered that the incremental method is faster, mostly likely due to the allocation time overhead of allocating the big 3D array.

Here's the tests, where I'm factoring out the data loading aspect, as it would be the same either way. Here's the results comparing the two methods in ipython with %timeit macro.

import numpy as np
# some_data is simulated as in the above code sample
def f1(some_data):
    x = some_data[0]
    x = x.clip(0,1)
    for y in some_data[1:]:
        x += y.clip(0,1)
    return x

def f2(some_data):
    X = np.dstack(some_data)
    X = X.clip(0,1)
    X = X.sum(axis=2)
    return X

%timeit x1 = f1(some_data)

10 loops, best of 3: 28.1 ms per loop

%timeit x2 = f2(some_data)

10 loops, best of 3: 103 ms per loop

So that's a 3.7x speedup by doing the process incrementally vs. as a single operation after stacking the data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM