简体   繁体   English

从Python中的一组数字创建“切片表示法”样式列表

[英]Creating a “slice notation” style list from a set of numbers in Python

Say I have a set of ~100,000 different numbers. 假设我有一组~100,000个不同的数字。 Some are sequential, some are not. 有些是顺序的,有些则不是。

To demonstrate the problem, a small subset of these numbers might be: 为了证明这个问题,这些数字的一小部分可能是:

(a) {1,2,3,4,5,6,7,8,9,11,13,15,45,46,47,3467} (a){1,2,3,4,5,6,7,8,9,11,13,15,45,46,47,3467}

An efficient way of writing this subset is as follows: 编写此子集的有效方法如下:

(b) 1:9:1,11:15:2,45:47:1,3467 (b)1:9:1,11:15:2,45:47:1,3467

This is effectively an extended version of python's and matlab's slice notation. 这实际上是python和matlab的切片表示法的扩展版本。

My question is: How can I efficiently obtain a list in the latter notation in Python, from a list of the former type? 我的问题是:如何从前一种类型的列表中有效地获取Python中后一种表示法的列表?

Ie, given (a), how can I efficiently obtain (b) in Python? 即,给定(a),我如何在Python中有效地获得(b)?

Disclaimer: I misread the question and thought you wanted to go from the slice notation to the set version, this doesn't actually answer your question but I figured it was worth leaving posted. 免责声明:我误读了这个问题,并认为你想从切片表示法转到设定版本, 这实际上并没有回答你的问题,但我认为值得留下发布。 It also seems that numpy._r does the same (or at least very similar) thing. 似乎numpy._r做同样(或至少非常相似)的事情。

First off note that if you are using python 3.5+ PEP 3132 gives is an option to use the *unpacking method in set literals: 首先请注意,如果您使用的是python 3.5+ PEP 3132 ,则可以选择在set literals中使用*unpacking方法:

>>> {*range(1,9), *range(11,15,2), *range(45,47), 3467}
{1, 2, 3, 4, 5, 6, 7, 8, 11, 3467, 13, 45, 46}

Otherwise the notation 11:15:2 is only used when __getitem__ or __setitem__ is used on an object, so you would just need to set up an object that will generate your sets: 否则,只有在对象上使用__getitem____setitem__时才使用符号11:15:2 ,因此您只需要设置一个将生成集合的对象:

def slice_to_range(slice_obj):
    assert isinstance(slice_obj, slice)
    assert slice_obj.stop is not None, "cannot have stop of None"
    start = slice_obj.start or 0
    stop = slice_obj.stop
    step = slice_obj.step or 1
    return range(start,stop,step)

class Slice_Set_Creator:
    def __getitem__(self,item):
        my_set = set()
        for part in item:
            if isinstance(part,slice):
                my_set.update(slice_to_range(part))
            else:
                my_set.add(part)
        return my_set

slice_set_creator = Slice_Set_Creator()

desired_set = slice_set_creator[1:9:1,11:15:2,45:47:1,3467]

>>> desired_set
{1, 2, 3, 4, 5, 6, 7, 8, 11, 3467, 13, 45, 46}

I think I got it but the following code was not very thoroughly tested and may contain bugs. 我想我得到了它,但下面的代码没有经过彻底的测试,可能包含错误。

Basically get_partial_slices will try to create partial_slice objects, when the next number in the (sorted) set does not .fit() into the slice it is .end() ed and the next slice is started. 基本上, get_partial_slices将尝试创建partial_slice对象,当(sorted)集中的下一个数字没有.fit()到切片时,它是.end() ed并且下一个切片被启动。

If a slice only has 1 item in it (or 2 items and step!=1 ) it is represented as separate numbers instead of a slice (hence the need for yield from current.end() since ending the slice may result in two numbers instead of one slice.) 如果切片中只有1个项目(或2个项目并且step!=1 ),则表示为单独的数字而不是切片(因此需要yield from current.end()获得yield from current.end()因为结束切片可能会产生两个数字而不是一片。)

class partial_slice:
    """heavily relied on by get_partial_slices
This attempts to create a slice from repeatedly adding numbers
once a number that doesn't fit the slice is found use .end()
to generate either the slice or the individual numbers"""
    def __init__(self, n):
        self.start = n
        self.stop = None
        self.step = None
    def fit(self,n):
        "returns True if n fits as the next element of the slice (or False if it does not"
        if self.step is None:
            return True #always take the second element into consideration
        elif self.stop == n:
            return True #n fits perfectly with current stop value
        else:
            return False

    def add(self, n):
        """adds a number to the end of the slice, 
    will raise a ValueError if the number doesn't fit"""
        if not self.fit(n):
            raise ValueError("{} does not fit into the slice".format(n))
        if self.step is None:
            self.step = n - self.start
        self.stop = n+self.step

    def to_slice(self):
        "return slice(self.start, self.stop, self.step)"
        return slice(self.start, self.stop, self.step)
    def end(self):
        "generates at most 3 items, may split up small slices"
        if self.step is None:
            yield self.start
            return
        length = (self.stop - self.start)//self.step
        if length>2:
            #always keep slices that contain more then 2 items
            yield self.to_slice()
            return 
        elif self.step==1 and length==2:
            yield self.to_slice()
            return
        else:
            yield self.start
            yield self.stop - self.step


def get_partial_slices(set_):
    data = iter(sorted(set_))
    current = partial_slice(next(data))
    for n in data:
        if current.fit(n):
            current.add(n)
        else:
            yield from current.end()
            current = partial_slice(n)
    yield from current.end()


test_case = {1,2,3,4,5,6,7,8,9,11,13,15,45,46,47,3467}
result = tuple(get_partial_slices(test_case))

#slice_set_creator is from my other answer,
#this will verify that the result was the same as the test case.
assert test_case == slice_set_creator[result] 

def slice_formatter(obj):
    if isinstance(obj,slice):
        # the actual slice objects, like all indexing in python, doesn't include the stop value
        # I added this part to modify it when printing but not when created because the slice 
        # objects can actually be used in code if you want (like with slice_set_creator)
        inclusive_stop = obj.stop - obj.step
        return "{0.start}:{stop}:{0.step}".format(obj, stop=inclusive_stop)
    else:
        return repr(obj)

print(", ".join(map(slice_formatter,result)))

The simplest way is to use numpy's r_[] syntax. 最简单的方法是使用numpy的r_[]语法。 So for your example, it would just be: 所以对于你的例子,它只是:

>>> from numpy import r_
>>>
>>> a = r_[1:10, 11:17:2, 45:48, 3467]

Keep in mind that python slices don't include the last number, and the x:y:1 is implied. 请记住,python切片不包含最后一个数字,并暗示x:y:1。 And this approach will not be as fast in production code as another, more sophisticated solution, but it is good for interactive use. 这种方法在生产代码中不会像另一种更复杂的解决方案那样快,但它对于交互式使用是有益的。

You can see that this gives you a numpy array with the numbers you want: 你可以看到这给你一个numpy数组,你想要的数字:

>>> print(a)
[   1    2    3    4    5    6    7    8    9   11   13   15   45   46   47
 3467]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM