简体   繁体   English

Python“R Cut”函数

[英]Python “R Cut” function

I am wondering if there is a function in Python already written for the goal that I describe later.我想知道 Python 中是否已经为我稍后描述的目标编写了一个函数。 If not, what would be the easiest way to implement.如果没有,什么是最简单的实现方法。 My code is attached.附上我的代码。

Say I have a range from 1 to 999999999. Given a list of numbers like this:假设我有一个从 1 到 999999999 的范围。给定一个这样的数字列表:

[9, 44, 99]

It would return它会回来

[(1,9), (10,44), (45,99), (100, 999999999)]

If the number which are the limits are included in the input numbers, it should handle that also.如果作为限制的数字包含在输入数字中,它也应该处理。 Say input is说输入是

[1, 9, 44, 999999999]

The return should be:回报应该是:

[(1,9), (10,44), (45, 999999999)]

I could write a for loop comparing with a few conditional statement but wondering if there is a more 'smart way'.我可以写一个 for 循环与一些条件语句进行比较,但想知道是否有更“聪明的方法”。

Some data massage that might be helpful:一些可能有用的数据按摩:

points = [1, 9, 44, 99]
points = sorted(list(set(points + [1, 999999999])))

UPDATED INFO: FINAL CREDITS GIVEN TO alecxe, thanks for your inspiring list comprehension solution更新信息:给 alecxe 的最终信用,感谢您鼓舞人心的列表理解解决方案

l = sorted(list(set(points + [1, 999999999])))
[(l[i] + int(i != 0), l[i + 1]) for i in xrange(len(l) - 1)]

You can put all that in one line but I think that is unnessary.您可以将所有内容放在一行中,但我认为这是不合时宜的。

pandas.cut() 熊猫.cut()

Example例子

[1,2,3,4,5,6,7,8,9,10] ---> [A,A,B,B,C,C,D,D,E,E]

R:回复:

x  <- seq(1,10,1)
cut(x, breaks = seq(0,10,2), labels = c('A','B','C','D','E'))

Python: Python:

import pandas
x = range(1, 11, 1)
pandas.cut(x, bins=range(0, 12, 2), labels=['A','B','C','D','E'])
def myCut(low, high, points):
    answer = []
    curr = low
    for point in points:
        answer.append((curr, point))
        curr = point + 1
    answer.append((curr, high))
    return answer

>>> low = 1
>>> high = 999999999
>>> points = [9, 44, 109]
>>> myCut(low, high, points)
[(1, 9), (10, 44), (45, 109), (110, 999999999)]

Inspired by this answer and the discussions that followed, here's a solution in fewer lines, with itertools .受到这个答案和随后的讨论的启发,这里有一个更少行的解决方案,带有itertools This uses itertools.chain and itertools.izip (in python2.7; zip in python3.x) to reduce the time and space complexities arising from adding lists, sorting and setifying.这使用itertools.chainitertools.izip (在 python2.7 中;在 python3.x 中为zip )来减少添加列表、排序和设置所带来的时间和空间复杂性。 Note that the solution assumes that the input list is already sorted, failing which, erroneous results will be produced请注意,该解决方案假定输入列表已经排序,否则将产生错误的结果

cuts = [(i+1, j) for i,j in itertools.izip(itertools.chain([0], myList), itertools.chain(myList, [999999999]))]

>>> import itertools
>>> myList = [9, 44, 99]
>>> [(i+1, j) for i,j in itertools.izip(itertools.chain([0], myList), itertools.
chain(myList, [999999999]))]
[(1, 9), (10, 44), (45, 99), (100, 999999999)]

Not sure this approach is the best one:不确定这种方法是最好的方法:

>>> l = [1, 9, 44, 999999999]
>>> [(l[i] + int(i != 0), l[i + 1]) for i in xrange(len(l) - 1)]
[(1, 9), (10, 44), (45, 999999999)]

If you are on python 3, replace xrange with range .如果您使用的是 python 3,请将xrange替换为range

Note, that for your first example to work, you'll need to prepend and append your boundaries:请注意,要使您的第一个示例正常工作,您需要预先添加并附加您的边界:

>>> l = [9, 44, 109]
>>> low, high = 1, 999999999
>>> l = [low] + l + [high]
>>> [(l[i] + int(i != 0), l[i + 1]) for i in xrange(len(l) - 1)]
[(1, 9), (10, 44), (45, 109), (110, 999999999)]

Comparing the code from the answers with timeit, it looks like inspectorG4dget's solution performs way better (especially with Python 3) even though I left out the addition of the low and high value in the list comprehension solution:将答案中的代码与 timeit 进行比较,看起来 inspectorG4dget 的解决方案执行得更好(尤其是使用 Python 3),即使我在列表理解解决方案中省略了低值和高值的添加:

ls = [9, 44, 109, 200, 567, 894, 6879, 29823]

def f1(low, high, points):
    answer = []
    curr = low
    for point in points:
        answer.append((curr, point))
        curr = point + 1
    answer.append((curr, high))
    return answer

def f2(low, high, l):
    a = [(l[i] + int(i != 0), l[i + 1]) for i in range(len(l) - 1)]
    return a

if __name__ == '__main__':
    import timeit

    print(timeit.timeit("f1(1, 99999999, ls)", setup="from __main__ import f1, ls"))
    print(timeit.timeit("f2(1, 99999999, ls)", setup="from __main__ import f2, ls"))

Results (py3 on my netbook):结果(我上网本上的 py3):

3.2064807919996383
8.850830605999363

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM