简体   繁体   中英

Best way to resize a data sequence in Python

I've got a data sequence (a list) that I have to resize. I've written a function for it, but its very crude. Does anyone know of a better way to solve this?

Expected behaviour:

In all examples my input data sequence is the following: Edit: even though the example is linear, you can't expect that the sequence is build by a formula.

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

When I resize it from 10 items to 5, I expect something like the following output:

[1, 3, 5, 7, 9] or [2, 4, 6, 8, 10]

Now all this isn't very difficult when you cut the length of the data sequence in half, but the size of my output sequence is variable. I could smaller or larger than the length of the original sequence.

When I resize it from 10 items to 19 (easy number to do manually), I expect something like this:

[1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10]

Current function

def sequenceResize(source, length):
    """
    Crude way of resizing a data sequence.
    Shrinking is here a lot more accurate than expanding.
    """
    sourceLen = len(source)
    out = []
    for i in range(length):
        key = int(i * (sourceLen / length))
        if key >= sourceLen:
            key = sourceLen - 1

        out.append(source[key])
    return out

This results in the following:

>>> sequenceResize(sequence, 5)
[1, 3, 5, 7, 9]
>>> sequenceResize(sequence, 19)
[1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10]

Shrinking is accurate, but expanding the sequence is not so great.

Does anyone know of an existing, or simple way to tackle this problem properly?

You can use np.lisnpace:

import numpy as np

list_in = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

resize = 19

list_out = np.linspace(list_in[0], list_in[-1], num=resize)

print(np.ndarray.tolist(list_out))

Output:

[1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0]

Instead of determining the index directly, you should calculate the ratio of "steps" between indices in both lists. Note that there is one fewer step than there are elements in the list. Then, you can get the floor and ceil item and determine the final value based on the decimal part of the current step, getting the weighted average between the two (see figure below).

def sequenceResize(source, length):
    step = float(len(source) - 1) / (length - 1)
    for i in range(length):
        key = i * step
        low = source[int(math.floor(key))]
        high = source[int(math.ceil(key))]
        ratio = key % 1
        yield (1 - ratio) * low + ratio * high

Or a bit shorter, using divmod :

def sequenceResize(source, length):
    step = float(len(source) - 1) / (length - 1)
    for i in range(length):
        low, ratio = divmod(i * step, 1)
        high = low + 1 if ratio > 0 else low
        yield (1- ratio) * source[int(low)] + ratio * source[int(high)]

Examples:

>>> sequence = [1, 2, 4, 8, 16]
>>> list(sequenceResize(sequence, 5))
[1, 2.0, 4.0, 8.0, 16.0]
>>> list(sequenceResize(sequence, 3))
[1, 4.0, 16.0]
>>> list(sequenceResize(sequence, 10))
[1, 1.44444, 1.88889, 2.66667, 3.55556, 4.88889, 6.66667, 8.88889, 12.44444, 16.0]
>>> list(sequenceResize(sequence, 19))
[1, 1.22222, 1.44444, 1.66667, 1.88889, 2.22222, 2.66667, 3.11111, 3.55556, 4.0, 4.88889, 5.77778, 6.66667, 7.55556, 8.88889, 10.66667, 12.44444, 14.22222, 16.0]

A different example as an illustration. Blue are the original values, and red the interpolated ones.

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM