在Python中合并数组切片

Question

所以我有一个看起来像这样的字符串：

data="ABCABDABDABBCBABABDBCABBDBACBBCDB"

我从其中随机抽取10个字符切片：

start=int(random.random()*100)
end = start+10
slice = data[start:start+10]

但是我现在要做的是计算根本没有切出的“缝隙”或“孔”的数量。

slices_indices = []
for i in xrange(0,100):
    start=int(random.random()*100)
    end= 10
    slice = data[start:end]
    ...
    slices_indices.append([start,end])

例如，运行几次之后。 我支付了这个金额：

ABCAB DABD ABBCBABABDB C ABBDBACBBCDB

但是留下了两个“缺口”。 有没有一种“ Pythonic”的方法来找到这些缺口的数量？ 因此，基本上我正在寻找给定切片索引的count_gaps函数。

例如以上

count_gaps(slices_indices)

会给我两个

提前致谢

Answer 1

有几个，尽管都涉及一些混乱

您可以将已删除的字符串与原始字符串进行比较，并确定未击中的字符。

但是，这是一种非常round回的方式，并且如果两次在字符串中使用相同的10个字符，将无法正常工作。 例如1234123之类的东西。

更好的解决方案是存储您使用的i的值，然后返回数据字符串，将当前位置与您使用的i的值进行比较（加10）。 如果不匹配，则完成工作。

例如（伪代码）

# Make an array the same length as the string
charsUsed = array(data.length)

# Do whatever
for i in xrange(0,100)
    someStuffYouWereDoingBefore()

    # Store our "used chars" in the array
    for(char = i; char < i+10; char++)
        if(char <= data.length) # Don't go out of bounds on the array!
            charsUsed[i] = true

然后要查看未使用哪些字符，只需遍历charsUsed数组并计算要计数的内容（连续的间隔等）

编辑以响应更新的问题：我仍将使用上述方法制作“使用了哪些字符”数组。 然后，您的count_gaps（）函数只需遍历数组即可“发现”差距

例如（伪...某物。这甚至还不是模糊的Python。但是希望您能理解）。这个想法本质上是看当前位置是否为假（即未使用）以及最后一个位置是否为真（使用），这意味着这是“新”差距的开始。 如果两者都为假，那么我们就在中间，如果两者都为真，那么我们就在“二手”字符串的中间

function find_gaps(array charsUsed)
{
    # Count the gaps
    numGaps = 0
    # What did we look at last (to see if it's the start of a gap)
    # Assume it's true if you want to count "gaps" at the start of the string, assume it's false if you don't.
    lastPositionUsed = true

    for(i = 0; i < charsUsed.length; i++)
    {
        if(charsUsed[i] = false && lastPositionUsed = true)
        {
            numGaps++
        }
        lastPositionUsed = charsUsed[i]
    }

    return numGaps
}

另一个选择是再次遍历charsUsed数组，并将连续值“分组”到更小的距离，然后计算所需的值...本质上是相同的东西，但是使用了不同的方法。 在这个例子中，我只是忽略了我不想要的组和我要做的组的“其余”，只计算了我们不想要的组和我们要做的组之间的边界。

Answer 2

这有点麻烦，但是我认为设置是可行的方法。 我希望下面的代码可以自我解释，但是如果您不理解某些部分，请告诉我。

#! /usr/bin/env python

''' Count gaps.

    Find and count the sections in a sequence that weren't touched by random slicing
    From http://stackoverflow.com/questions/26060688/merging-arrays-slices-in-python
    Written by PM 2Ring 2014.09.27
'''

import random
from string import ascii_lowercase


def main():
    def rand_slice():
        start = random.randint(0, len(data) - slice_width) 
        return start, start + slice_width

    #The data to slice
    data = 5 * ascii_lowercase
    print 'Data:\n%s\nLength : %d\n' % (data, len(data))

    random.seed(42)

    #A set to capture slice ranges
    slices = set()    
    slice_width = 10    
    num_slices = 10
    print 'Extracting %d slices from data' % num_slices
    for i in xrange(num_slices):
        start, end = rand_slice()
        slices |= set(xrange(start, end))
        data_slice = data[start:end].upper()
        print '\n%2d, %2d : %s' % (start, end, data_slice)
        data = data[:start] + data_slice + data[end:]
        print data
        #print sorted(slices)

    print '\nSlices:\n%s\n' % sorted(slices)

    print '\nSearching for gaps missed by slicing'
    unsliced = sorted(tuple(set(xrange(len(data))) - slices))
    print 'Unsliced:\n%s\n' % (unsliced,)

    gaps = []    
    if unsliced:
        last = start = unsliced[0]
        for i in unsliced[1:]:
            if i > last + 1:
                t = (start, last + 1)
                gaps.append(t)
                print t
                start = i
            last = i
        t = (start, last + 1)
        gaps.append(t)
        print t

    print '\nGaps:\n%s\nCount: %d' % (gaps, len(gaps))


if __name__ == '__main__':
    main()

Answer 3

我会使用某种位图。 例如，扩展代码：

data="ABCABDABDABBCBABABDBCABBDBACBBCDB"

slices_indices = [0]*len(data)
for i in xrange(0,100):
    start=int(random.random()*len(data))
    end=start + 10
    slice = data[start:end]
    slices_indices[start:end] = [1] * len(slice)

我在这里使用了一个list ，但是如果您的数据很大，则可以使用任何其他适当的数据结构，可能更紧凑。

因此，我们已将位图初始化为零，并用1标记了选定的数据块。 现在我们可以使用itertools一些东西，例如：

from itertools import groupby

groups = groupby(slices_indices)

groupby返回一个迭代器，其中每个元素都是一个元组(element, iterator) 。 要仅计算差距，您可以做一些简单的事情，例如：

gaps = len([x for x in groups if x[0] == 0])

在Python中合并数组切片

问题描述

3 个解决方案

解决方案1
0 2014-09-26 13:35:39

解决方案2
0 2014-09-26 16:49:33

解决方案3
0 2014-09-26 16:59:00

在Python中合并数组切片

问题描述

3 个解决方案

解决方案1 0 2014-09-26 13:35:39

解决方案2 0 2014-09-26 16:49:33

解决方案3 0 2014-09-26 16:59:00

解决方案1
0 2014-09-26 13:35:39

解决方案2
0 2014-09-26 16:49:33

解决方案3
0 2014-09-26 16:59:00