Python“R Cut”函数

Question

I am wondering if there is a function in Python already written for the goal that I describe later.我想知道 Python 中是否已经为我稍后描述的目标编写了一个函数。 If not, what would be the easiest way to implement.如果没有，什么是最简单的实现方法。 My code is attached.附上我的代码。

Say I have a range from 1 to 999999999. Given a list of numbers like this:假设我有一个从 1 到 999999999 的范围。给定一个这样的数字列表：

[9, 44, 99]

It would return它会回来

[(1,9), (10,44), (45,99), (100, 999999999)]

If the number which are the limits are included in the input numbers, it should handle that also.如果作为限制的数字包含在输入数字中，它也应该处理。 Say input is说输入是

[1, 9, 44, 999999999]

The return should be:回报应该是：

[(1,9), (10,44), (45, 999999999)]

I could write a for loop comparing with a few conditional statement but wondering if there is a more 'smart way'.我可以写一个 for 循环与一些条件语句进行比较，但想知道是否有更“聪明的方法”。

Some data massage that might be helpful:一些可能有用的数据按摩：

points = [1, 9, 44, 99]
points = sorted(list(set(points + [1, 999999999])))

UPDATED INFO: FINAL CREDITS GIVEN TO alecxe, thanks for your inspiring list comprehension solution更新信息：给 alecxe 的最终信用，感谢您鼓舞人心的列表理解解决方案

l = sorted(list(set(points + [1, 999999999])))
[(l[i] + int(i != 0), l[i + 1]) for i in xrange(len(l) - 1)]

You can put all that in one line but I think that is unnessary.您可以将所有内容放在一行中，但我认为这是不合时宜的。

Answer 1

pandas.cut() 熊猫.cut()

Example例子

[1,2,3,4,5,6,7,8,9,10] ---> [A,A,B,B,C,C,D,D,E,E]

R:回复：

x  <- seq(1,10,1)
cut(x, breaks = seq(0,10,2), labels = c('A','B','C','D','E'))

Python: Python：

import pandas
x = range(1, 11, 1)
pandas.cut(x, bins=range(0, 12, 2), labels=['A','B','C','D','E'])

Answer 2

def myCut(low, high, points):
    answer = []
    curr = low
    for point in points:
        answer.append((curr, point))
        curr = point + 1
    answer.append((curr, high))
    return answer

>>> low = 1
>>> high = 999999999
>>> points = [9, 44, 109]
>>> myCut(low, high, points)
[(1, 9), (10, 44), (45, 109), (110, 999999999)]

Inspired by this answer and the discussions that followed, here's a solution in fewer lines, with itertools .受到这个答案和随后的讨论的启发，这里有一个更少行的解决方案，带有itertools 。 This uses itertools.chain and itertools.izip (in python2.7; zip in python3.x) to reduce the time and space complexities arising from adding lists, sorting and setifying.这使用itertools.chain和itertools.izip （在 python2.7 中；在 python3.x 中为zip ）来减少添加列表、排序和设置所带来的时间和空间复杂性。 Note that the solution assumes that the input list is already sorted, failing which, erroneous results will be produced请注意，该解决方案假定输入列表已经排序，否则将产生错误的结果

cuts = [(i+1, j) for i,j in itertools.izip(itertools.chain([0], myList), itertools.chain(myList, [999999999]))]

>>> import itertools
>>> myList = [9, 44, 99]
>>> [(i+1, j) for i,j in itertools.izip(itertools.chain([0], myList), itertools.
chain(myList, [999999999]))]
[(1, 9), (10, 44), (45, 99), (100, 999999999)]

Answer 3

Not sure this approach is the best one:不确定这种方法是最好的方法：

>>> l = [1, 9, 44, 999999999]
>>> [(l[i] + int(i != 0), l[i + 1]) for i in xrange(len(l) - 1)]
[(1, 9), (10, 44), (45, 999999999)]

If you are on python 3, replace xrange with range .如果您使用的是 python 3，请将xrange替换为range 。

Note, that for your first example to work, you'll need to prepend and append your boundaries:请注意，要使您的第一个示例正常工作，您需要预先添加并附加您的边界：

>>> l = [9, 44, 109]
>>> low, high = 1, 999999999
>>> l = [low] + l + [high]
>>> [(l[i] + int(i != 0), l[i + 1]) for i in xrange(len(l) - 1)]
[(1, 9), (10, 44), (45, 109), (110, 999999999)]

Answer 4

Comparing the code from the answers with timeit, it looks like inspectorG4dget's solution performs way better (especially with Python 3) even though I left out the addition of the low and high value in the list comprehension solution:将答案中的代码与 timeit 进行比较，看起来 inspectorG4dget 的解决方案执行得更好（尤其是使用 Python 3），即使我在列表理解解决方案中省略了低值和高值的添加：

ls = [9, 44, 109, 200, 567, 894, 6879, 29823]

def f1(low, high, points):
    answer = []
    curr = low
    for point in points:
        answer.append((curr, point))
        curr = point + 1
    answer.append((curr, high))
    return answer

def f2(low, high, l):
    a = [(l[i] + int(i != 0), l[i + 1]) for i in range(len(l) - 1)]
    return a

if __name__ == '__main__':
    import timeit

    print(timeit.timeit("f1(1, 99999999, ls)", setup="from __main__ import f1, ls"))
    print(timeit.timeit("f2(1, 99999999, ls)", setup="from __main__ import f2, ls"))

Results (py3 on my netbook):结果（我上网本上的 py3）：

3.2064807919996383
8.850830605999363

Python“R Cut”函数

问题描述

4 个解决方案

解决方案1
17 已采纳 2016-11-08 16:51:40

解决方案2
1 2013-08-30 18:06:15

解决方案3
1 2013-08-30 18:06:22

解决方案4
0 2013-08-30 20:33:08

Python“R Cut”函数

问题描述

4 个解决方案

解决方案1 17 已采纳 2016-11-08 16:51:40

解决方案2 1 2013-08-30 18:06:15

解决方案3 1 2013-08-30 18:06:22

解决方案4 0 2013-08-30 20:33:08

解决方案1
17 已采纳 2016-11-08 16:51:40

解决方案2
1 2013-08-30 18:06:15

解决方案3
1 2013-08-30 18:06:22

解决方案4
0 2013-08-30 20:33:08