简体   繁体   中英

Stretch an array and fill nan

I have a 1-d numpy array with length n, and I want to stretch it to m (n<m) and systematically add numpy.nan.

For example:

>>> arr = [4,5,1,2,6,8] # take this
>>> stretch(arr,8)
[4,5,np.nan,1,2,np.nan,6,8] # convert to this

Requirements: 1. No nan's at both ends (if possible) 2. Work for all lengths

I've tried

>>> def stretch(x,to,fill=np.nan):
...     step = to/len(x)
...     output = np.repeat(fill,to)
...     foreign = np.arange(0,to,step).round().astype(int)
...     output[foreign] = x
...     return output

>>> arr = np.random.rand(6553)
>>> stretch(arr,6622)

  File "<ipython-input-216-0202bc39278e>", line 2, in <module>
    stretch(arr,6622)

  File "<ipython-input-211-177ee8bc10a7>", line 9, in stretch
    output[foreign] = x

ValueError: shape mismatch: value array of shape (6553,) could not be broadcast to indexing result of shape (6554,)

Does not seem to work properly (for an array of length 6553, violates req 2, and does not guarantee 1), any clues to overcome this?

Using roundrobin from itertools Recipes :

from itertools import cycle, islice

def roundrobin(*iterables):
    "roundrobin('ABC', 'D', 'EF') --> A D E B F C"
    # Recipe credited to George Sakkis
    pending = len(iterables)
    nexts = cycle(iter(it).__next__ for it in iterables)
    while pending:
        try:
            for next in nexts:
                yield next()
        except StopIteration:
            pending -= 1
            nexts = cycle(islice(nexts, pending))

def stretch(x, to, fill=np.nan):
    n_gaps = to - len(x)
    return np.hstack([*roundrobin(np.array_split(x, n_gaps+1), np.repeat(fill, n_gaps))])

arr = [4,5,1,2,6,8]
stretch(arr, 8)
# array([ 4.,  5., nan,  1.,  2., nan,  6.,  8.])

arr2 = np.random.rand(655)
stretched_arr2 = stretch(arr,662)
np.diff(np.argwhere(np.isnan(stretched_arr2)), axis=0)
# nans are evenly spaced    
array([[83],
       [83],
       [83],
       [83],
       [83],
       [83]])

Logic behind

n_gaps : calculates how many gaps to fill (desired length - current length)

np_array_split : with n_gaps+1 , it splits input array into as same length as possible

roundrobin : since np_array_split generates one more array than gaps, roundrobin-ing (ie alternatively iterating) grants that np.nan is never at either end of result.

This approach places the non-nan elements at the boundaries, leaving nan values at the center, although it won't evenly space the nan values.

arr = [4,5,1,2,6,8]   
stretch_len = 8    

def stretch(arr, stretch_len):
    stretched_arr = np.empty(stretch_len)   
    stretched_arr.fill(np.nan)
    arr_len = len(arr)

    if arr_len % 2 == 0:
        mid = int(arr_len/2)
        stretched_arr[:mid] = arr[:mid]
        stretched_arr[-mid:] = arr[-mid:]
    else:
        mid = int(np.floor(arr_len/2))
        stretched_arr[:mid] = arr[:mid]
        stretched_arr[-mid-1:] = arr[-mid-1:]

    return stretched_arr

Here are some test cases that I tested:

Test cases:

In [104]: stretch(arr, stretch_len)   
Out[104]: array([ 4.,  5.,  1., nan, nan,  2.,  6.,  8.])

In [105]: arr = [4, 5, 1, 2, 6, 8, 9]    

In [106]: stretch(arr, stretch_len)  
Out[106]: array([ 4.,  5.,  1., nan,  2.,  6.,  8.,  9.])

In [107]: stretch(arr, 9)  
Out[107]: array([ 4.,  5.,  1., nan, nan,  2.,  6.,  8.,  9.])

Although Chris resolved the problem, I found a shorter answer, which maybe helpful,

def stretch2(x,to,fill=np.nan):
    output  = np.repeat(fill,to)
    foreign = np.linspace(0,to-1,len(x)).round().astype(int)
    output[foreign] = x
    return output

very similar to my first attempt. Timings:

>>> x = np.random.rand(1000)
>>> to = 1200
>>> %timeit stretch(x,to) # Chris' version
>>> %timeit stretch2(x,to)

996 µs ± 22.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
32.2 µs ± 339 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Check if it works properly:

>>> aa = stretch2(x,to)
>>> np.diff(np.where(np.isnan(aa))[0])
array([6, 6, 6, ... , 6])
>>> np.sum(aa[~np.isnan(aa)] - x)
0.0

Check boundary conditions:

>>> aa[:5]
array([0.78581616, 0.1630689 , 0.52039993,        nan, 0.89844404])
>>> aa[-5:]
array([0.7063653 ,        nan, 0.2022172 , 0.94604503, 0.91201897])

all satisfied. Works for all 1-d arrays, and can be generalized to work with nd arrays too, with a few changes.

You can use resize to resize the array.

Once, it is resized, you can apply appropriate logic to rearrange the contents.

Check the below link: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.resize.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM