简体   繁体   中英

how to group 1D array consecutive elements in python preferably

I have following 1D array:

[0, 0, 0, 1, 0, 0, 16, 249, 142, 149, 189, 135, 141, 146, 294, 3, 2, 0, 3, 3, 6, 2, 3, 4, 21, 22, 138, 95, 86, 110, 72, 89, 79, 138, 14, 18, 18, 18, 12, 15, 21, 22, 11, 20, 26, 90, 62, 128, 94, 117, 81, 81, 137, 7, 13, 14, 6, 10, 8, 11, 10, 13, 21, 18, 140, 69, 147, 110, 112, 88, 100, 197, 9, 20, 5, 6, 5, 4, 7, 10, 21, 32, 42, 56, 41, 156, 95, 112, 81, 93, 152, 14, 19, 9, 12, 20, 18, 14, 21, 18, 18, 14, 91, 47, 43, 63, 41, 45, 43, 85, 15, 16, 14, 10, 11]

I can see the pattern where the spikes are. So I want above array grouped as below:

[[0, 0, 0, 1, 0, 0, 16], [249, 142, 149, 189, 135, 141, 146, 294], [3, 2, 0, 3, 3, 6, 2, 3, 4, 21, 22], [138, 95, 86, 110, 72, 89, 79, 138]....so on]

I tried to use K mean, some combination of mean and std deviation. But none of them are resulting in this kind of grouping. Please help!

Edit: These data are sum of dark pixel values of gray scaled image along x axes summed up on y axes. Higher range group represent written lines and lower range group represent blank lines. It means, I want to separate written and blank lines on image. So there is a pattern. Written lines will be of same width, that is their group length will be same. Blank lines may have sudden spike because of background noises. But overall, manually, I can see a pattern of written and blank lines. I want it programmatically.

A simple threshold-based approach will work in this case.

x = np.array([0, 0, 0, 1, 0, 0, 16, 249, 142, 149, 189, 135, 141, 146, 294, 3, 2, 
              0, 3, 3, 6, 2, 3, 4, 21, 22, 138, 95, 86, 110, 72, 89, 79, 138, 14, 
              18, 18, 18, 12, 15, 21, 22, 11, 20, 26, 90, 62, 128, 94, 117, 81, 
              81, 137, 7, 13, 14, 6, 10, 8, 11, 10, 13, 21, 18, 140, 69, 147, 
              110, 112, 88, 100, 197, 9, 20, 5, 6, 5, 4, 7, 10, 21, 32, 42, 56, 
              41, 156, 95, 112, 81, 93, 152, 14, 19, 9, 12, 20, 18, 14, 21, 18, 
              18, 14, 91, 47, 43, 63, 41, 45, 43, 85, 15, 16, 14, 10, 11])

mask = x > 30  # Mark values above/below threshold

cuts = np.flatnonzero(np.diff(mask))  # find indices where mask changes
cuts = np.hstack([0, cuts + 1, -1])  # let indices point after the change and add beginning and end of the array.

groups = []
for a, b in zip(cuts[:-1], cuts[1:]):  # iterate over index pairs
    groups.append(x[a:b].tolist())
print(groups)

# [[0, 0, 0, 1, 0, 0, 16], [249, 142, 149, 189, 135, 141, 146, 294], [3, 2, 0, 3, 3, 6, 2, 3, 4, 21, 22], [138, 95, 86, 110, 72, 89, 79, 138], [14, 18, 18, 18, 12, 15, 21, 22, 11, 20, 26], [90, 62, 128, 94, 117, 81, 81, 137], [7, 13, 14, 6, 10, 8, 11, 10, 13, 21, 18], [140, 69, 147, 110, 112, 88, 100, 197], [9, 20, 5, 6, 5, 4, 7, 10, 21], [32, 42, 56, 41, 156, 95, 112, 81, 93, 152], [14, 19, 9, 12, 20, 18, 14, 21, 18, 18, 14], [91, 47, 43, 63, 41, 45, 43, 85], [15, 16, 14, 10]]

More sophisticated approaches could involve fitting a piecewise constant model or detecting statistical instationarities, but usually it's best to stick with the simplest possible method that works.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM