简体   繁体   中英

Slicing a pattern in a large 1-d NumPy array

I have got a 1-d array where I have a pattern in the enteries. I will give an example. In the array arr , I have first 4 enteries with single digits, next 4 enteries with two digits and then the next 6 enteries with 3 digits. ( This single, double, triple digit thing is just to highlight the pattern. The actual array have float numbers of similar values). The example 1-d array looks like:

import numpy as np
arr = np.array([1, 2, 3, 4, 11, 12, 13, 14, 111, 123, 132, 145, 176, 129, 
                6, 5, 3, 2, 21, 82, 53, 34, 121, 133, 139, 165, 186, 119])

Now, one complete pattern has total 4+4+6 = 14 enteries. This pattern (or repeating unit) is repeated several hundred thousand times so the length of my array is a multiple of 14 (14 * 2 = 28 in the example arr above).

Question:

I want to extract all the one digit enteries (first 4 numbers of one repeating unit), all the two digit enteries (next 4 numbers of one repeating unit), and all the three digit enteries (next 6 numbers of one repeating unit).

This way I want to have my big arr splitted into three 1-d arrays. So the desired output is

arr1 = array([1, 2, 3, 4, 6, 5, 3, 2])
arr2 = array([11, 12, 13, 14, 21, 82, 53, 34])
arr3 = array([111, 123, 132, 145, 176, 129, 121, 133, 139, 165, 186, 119])

My idea

One way could be to simply reshape it into 2d array since I know the number of repetitions (=28/14 = 2 in the example arr ) and then use indexing to get all the first chunks of 4, 4 and 6 and then concatenate .

arr = arr.reshape(2, 14)

and then use slicing to get the chunks as

arr1 = np.concatenate(arr[:, 0:4])
arr2 = np.concatenate(arr[:, 4:8])
arr3 = np.concatenate(arr[:, 8:])
print (arr1, arr2, arr3)

# array([1, 2, 3, 4, 6, 5, 3, 2]),
# array([11, 12, 13, 14, 21, 82, 53, 34]),
# array([111, 123, 132, 145, 176, 129, 121, 133, 139, 165, 186, 119]))

But I am interested in knowing an alternative and efficient solution using some sort of masking and slicing without converting first to a 2-d array.

Using a mask of the pattern as requested (and supposing that arr length is an exact multiple of the mask length):

mask1 = [True]*4 + [False]*10
mask2 = [False]*4 + [True]*4 + [False]*6
mask3 = [False]*8 + [True]*6

Then you directly get the desired arrays by doing:

n_masks = (len(arr) // len(mask1))
arr1 = arr[mask1 * n_masks]
arr2 = arr[mask2 * n_masks]
arr3 = arr[mask3 * n_masks]

You could access the indices directly

import numpy as np
arr = np.array([1, 2, 3, 4, 11, 12, 13, 14, 111, 123, 132, 145, 176, 129,
                6, 5, 3, 2, 21, 82, 53, 34, 121, 133, 139, 165, 186, 119])

run_length = 14
repetitions = 2

indices1 = [run_length * i + j for i in range(repetitions) for j in range(4)]
arr1 = arr[indices1]

indices2 = [run_length * i + j for i in range(repetitions) for j in range(4, 8)]
arr2 = arr[indices2]

indices3 = [run_length * i + j for i in range(repetitions) for j in range(8, 14)]
arr3 = arr[indices3]

print(arr1)
print(arr2)
print(arr3)

Output

[1 2 3 4 6 5 3 2]
[11 12 13 14 21 82 53 34]
[111 123 132 145 176 129 121 133 139 165 186 119]

You could put everything in a function like this:

import numpy as np
arr = np.array([1, 2, 3, 4, 11, 12, 13, 14, 111, 123, 132, 145, 176, 129,
                6, 5, 3, 2, 21, 82, 53, 34, 121, 133, 139, 165, 186, 119])


def extract(arr, run_length, repetitions, pattern_lengths):
    chunks = [0] + np.cumsum(pattern_lengths).tolist()

    for start, end in zip(chunks, chunks[1:]):
        indices = [run_length * i + j for i in range(repetitions) for j in range(start, end)]
        yield arr[indices]


arr1, arr2, arr3 = list(extract(arr, 14, 2, [4, 4, 6]))

print(arr1)
print(arr2)
print(arr3)

You can also build the mask:

# if you know where your indices are, otherwise use a formula
mask = np.zeros((3, 2, 14), dtype=bool)
mask[0,:, 0:4] = True
mask[1,:, 4:8] = True
mask[2,:, 8:] = True

arr1, arr2, arr3 = (arr[m.flatten()] for m in mask)
print (arr1, arr2, arr3)

We could simply reshape into 2D (remember reshaping creates a view and has zero memory overhead and hence virtually free on runtime) with the number of cols same as the pattern lenth (14 in the sample case). Then, slice out the first 4 entries for first array output, next 4 for second and 8th col onwards for the last one.

Since, we need flattened output, we can do so with .ravel() .

Hence -

In [44]: a2d = arr.reshape(-1,14) # 2d view into arr
    ...: arr1,arr2,arr3 = a2d[:,:4].ravel(),a2d[:,4:8].ravel(),a2d[:,8:].ravel()

In [45]: arr1
Out[45]: array([1, 2, 3, 4, 6, 5, 3, 2])

In [46]: arr2
Out[46]: array([11, 12, 13, 14, 21, 82, 53, 34])

In [47]: arr3
Out[47]: array([111, 123, 132, 145, 176, 129, 121, 133, 139, 165, 186, 119])

Now, say we are okay with 2D array outputs, then -

In [48]: arr1,arr2,arr3 = a2d[:,:4],a2d[:,4:8],a2d[:,8:]

In [49]: arr1
Out[49]: 
array([[1, 2, 3, 4],
       [6, 5, 3, 2]])

In [50]: arr2
Out[50]: 
array([[11, 12, 13, 14],
       [21, 82, 53, 34]])

In [51]: arr3
Out[51]: 
array([[111, 123, 132, 145, 176, 129],
       [121, 133, 139, 165, 186, 119]])

So, why take this? Because it's a view into the original input arr and hence as mentioned earlier has zero memory overhead and virtually free -

In [52]: np.shares_memory(arr,arr1)
Out[52]: True

and so on for other two arrays.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM