[英]Python “R Cut” function
I am wondering if there is a function in Python already written for the goal that I describe later.我想知道 Python 中是否已经为我稍后描述的目标编写了一个函数。 If not, what would be the easiest way to implement.如果没有,什么是最简单的实现方法。 My code is attached.附上我的代码。
Say I have a range from 1 to 999999999. Given a list of numbers like this:假设我有一个从 1 到 999999999 的范围。给定一个这样的数字列表:
[9, 44, 99]
It would return它会回来
[(1,9), (10,44), (45,99), (100, 999999999)]
If the number which are the limits are included in the input numbers, it should handle that also.如果作为限制的数字包含在输入数字中,它也应该处理。 Say input is说输入是
[1, 9, 44, 999999999]
The return should be:回报应该是:
[(1,9), (10,44), (45, 999999999)]
I could write a for loop comparing with a few conditional statement but wondering if there is a more 'smart way'.我可以写一个 for 循环与一些条件语句进行比较,但想知道是否有更“聪明的方法”。
Some data massage that might be helpful:一些可能有用的数据按摩:
points = [1, 9, 44, 99]
points = sorted(list(set(points + [1, 999999999])))
UPDATED INFO: FINAL CREDITS GIVEN TO alecxe, thanks for your inspiring list comprehension solution更新信息:给 alecxe 的最终信用,感谢您鼓舞人心的列表理解解决方案
l = sorted(list(set(points + [1, 999999999])))
[(l[i] + int(i != 0), l[i + 1]) for i in xrange(len(l) - 1)]
You can put all that in one line but I think that is unnessary.您可以将所有内容放在一行中,但我认为这是不合时宜的。
Example例子
[1,2,3,4,5,6,7,8,9,10] ---> [A,A,B,B,C,C,D,D,E,E]
R:回复:
x <- seq(1,10,1)
cut(x, breaks = seq(0,10,2), labels = c('A','B','C','D','E'))
Python: Python:
import pandas
x = range(1, 11, 1)
pandas.cut(x, bins=range(0, 12, 2), labels=['A','B','C','D','E'])
def myCut(low, high, points):
answer = []
curr = low
for point in points:
answer.append((curr, point))
curr = point + 1
answer.append((curr, high))
return answer
>>> low = 1
>>> high = 999999999
>>> points = [9, 44, 109]
>>> myCut(low, high, points)
[(1, 9), (10, 44), (45, 109), (110, 999999999)]
Inspired by this answer and the discussions that followed, here's a solution in fewer lines, with itertools
.受到这个答案和随后的讨论的启发,这里有一个更少行的解决方案,带有itertools
。 This uses itertools.chain
and itertools.izip
(in python2.7; zip
in python3.x) to reduce the time and space complexities arising from adding lists, sorting and setifying.这使用itertools.chain
和itertools.izip
(在 python2.7 中;在 python3.x 中为zip
)来减少添加列表、排序和设置所带来的时间和空间复杂性。 Note that the solution assumes that the input list is already sorted, failing which, erroneous results will be produced请注意,该解决方案假定输入列表已经排序,否则将产生错误的结果
cuts = [(i+1, j) for i,j in itertools.izip(itertools.chain([0], myList), itertools.chain(myList, [999999999]))]
>>> import itertools
>>> myList = [9, 44, 99]
>>> [(i+1, j) for i,j in itertools.izip(itertools.chain([0], myList), itertools.
chain(myList, [999999999]))]
[(1, 9), (10, 44), (45, 99), (100, 999999999)]
Not sure this approach is the best one:不确定这种方法是最好的方法:
>>> l = [1, 9, 44, 999999999]
>>> [(l[i] + int(i != 0), l[i + 1]) for i in xrange(len(l) - 1)]
[(1, 9), (10, 44), (45, 999999999)]
If you are on python 3, replace xrange
with range
.如果您使用的是 python 3,请将xrange
替换为range
。
Note, that for your first example to work, you'll need to prepend and append your boundaries:请注意,要使您的第一个示例正常工作,您需要预先添加并附加您的边界:
>>> l = [9, 44, 109]
>>> low, high = 1, 999999999
>>> l = [low] + l + [high]
>>> [(l[i] + int(i != 0), l[i + 1]) for i in xrange(len(l) - 1)]
[(1, 9), (10, 44), (45, 109), (110, 999999999)]
Comparing the code from the answers with timeit, it looks like inspectorG4dget's solution performs way better (especially with Python 3) even though I left out the addition of the low and high value in the list comprehension solution:将答案中的代码与 timeit 进行比较,看起来 inspectorG4dget 的解决方案执行得更好(尤其是使用 Python 3),即使我在列表理解解决方案中省略了低值和高值的添加:
ls = [9, 44, 109, 200, 567, 894, 6879, 29823]
def f1(low, high, points):
answer = []
curr = low
for point in points:
answer.append((curr, point))
curr = point + 1
answer.append((curr, high))
return answer
def f2(low, high, l):
a = [(l[i] + int(i != 0), l[i + 1]) for i in range(len(l) - 1)]
return a
if __name__ == '__main__':
import timeit
print(timeit.timeit("f1(1, 99999999, ls)", setup="from __main__ import f1, ls"))
print(timeit.timeit("f2(1, 99999999, ls)", setup="from __main__ import f2, ls"))
Results (py3 on my netbook):结果(我上网本上的 py3):
3.2064807919996383
8.850830605999363
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.