What is the most efficient algorithm to find the midpoint of the index of a repeated sequence of numbers?

Question

a=[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1 -1, -1, -1, -1,-1, 0, 0, 0, 0, 0]

I want to be able to obtain mid-point of index of the repeated points ie

output_vector = [2, 8,  13, 19]

ie output_vector[0] is index of midpoint of first sequence 0, 0, 0, 0, 0

output_vector[1] is midpoint of the second repeated sequence 1, 1, 1, 1, 1, 1, 1

output_vector[2] is midpoint of the second repeated sequence -1, -1, -1, -1,-1

Answer 1

One way is to use itertools.groupby to find groups and calculate their midpoints:

from itertools import groupby

a = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1,-1, 0, 0, 0, 0, 0]

groups = [list(g) for _, g in groupby(a)]    
output_vector = [sum(1 for x in groups[:i] for _ in x) + len(x) // 2 for i, x in enumerate(groups)]
# [2, 8, 14, 19]

Answer 2

The itertools method is probably better and cleaner. Nonetheless here's a method that uses math and statistics and goes through finding the median of the start and end indexes of each set of numbers.

import math
import statistics as stat

a = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, 0, 0, 0, 0, 0]

lastNum = None
startIdx = 0
midpts = []
for idx, x in enumerate(a):
    if lastNum is not None and lastNum != x or idx == len(a) - 1:
        midpts.append(math.floor(stat.median([startIdx, idx])))
        startIdx = idx
    lastNum = x

print(midpts)
# [2, 8, 14, 19]

Answer 3

Another itertools based solution, but more efficient.

from itertools import groupby

a = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1,-1, 0, 0, 0, 0, 0]

output = []
psum = 0 
for glen in (sum(1 for i in g) for k, g in groupby(a)):
    output.append(psum + glen // 2)
    psum += glen

print(output)

Answer 4

C++ based implementation of @Matt M's answer

  template<typename T>
    std::vector<size_t> getPeaks(std::vector<T>& input_vector) {
        std::vector<size_t> output;
        T lastNum = 10000;
        size_t startIdx = 0;
        for (size_t i = 0; i < input_vector.size(); ++i) {
            if ((lastNum != 10000 and lastNum != input_vector[i]) || (i == input_vector.size() - 1)) {
                auto medianIdx = findMedian(startIdx, i);
                output.emplace_back(medianIdx);
                startIdx = i;

            }
            lastNum = input_vector[i];
        }
        return output;

}

 size_t findMedian(size_t start, size_t end) {
    return start + (end - start) / 2;
}

What is the most efficient algorithm to find the midpoint of the index of a repeated sequence of numbers?

Question

4 answers

solution1
2 2019-07-12 17:37:59

solution2
2 ACCPTED 2019-07-12 17:53:21

solution3
2 2019-07-12 20:21:30

solution4
1 2019-07-15 21:20:57

What is the most efficient algorithm to find the midpoint of the index of a repeated sequence of numbers?

Question

4 answers

solution1 2 2019-07-12 17:37:59

solution2 2 ACCPTED 2019-07-12 17:53:21

solution3 2 2019-07-12 20:21:30

solution4 1 2019-07-15 21:20:57

solution1
2 2019-07-12 17:37:59

solution2
2 ACCPTED 2019-07-12 17:53:21

solution3
2 2019-07-12 20:21:30

solution4
1 2019-07-15 21:20:57