[英]Reverse slice from end of list to a specific index
Let's say I need a slice from the end of a sequence seq
to the first occurrence of a given item x
(inclusive). 假设我需要从序列
seq
的末尾到给定项x
(包括)的第一次出现的切片。 The naive attempt to write seq[-1:seq.index(x)-1:-1]
creates a subtle bug: 写入
seq[-1:seq.index(x)-1:-1]
的天真尝试会产生一个微妙的错误:
seq = 'abc'
seq[-1:seq.index('b')-1:-1] # 'cb' as expected
seq[-1:seq.index('a')-1:-1] # '' because -1 is interpreted as end of seq
Is there any idiomatic way to write this? 有没有惯用的方法来写这个?
seq[seq.index(x):][::-1]
works fine, but it is presumably inefficient for large sequences since it creates an extra copy. seq[seq.index(x):][::-1]
工作正常,但对于大型序列来说可能效率低,因为它会创建一个额外的副本。 (I do need a sequence in the end, so one copy is necessary; I just don't want to create a second copy.) (我确实需要一个序列,所以需要一个副本;我只是不想创建第二个副本。)
On a side note, this is a really easy bug to introduce, it can pass many tests, and is undetectable to any static analyzer (unless it warns about every slice with a negative step). 另外,这是一个非常容易引入的错误,它可以通过许多测试,并且无法检测到任何静态分析器(除非它警告每个切片都有一个负步骤)。
Update 更新
There seems to be no perfect / idiomatic solution. 似乎没有完美/惯用的解决方案。 I agree that it may not be the bottleneck as often as I thought, so I'll use
[pos:][::-1]
in most cases. 我同意它可能不像我想象的那样经常出现瓶颈,因此在大多数情况下我会使用
[pos:][::-1]
。 When performance is important, I'd use the normal if
check. 当性能很重要时,我会使用正常的
if
check。 However, I'll accept the solution that I found interesting even though it's hard to read; 但是,我会接受我觉得有趣的解决方案,即使它很难阅读; it's probably usable in certain rare cases (where I really need to fit the whole thing into an expression, and I don't want to define a new function).
它可能在某些极少数情况下可用(我真的需要将整个事物放入表达式中,我不想定义新函数)。
Also, I tried timing this. 另外,我试过计时。 For lists it seems there's always a 2x penalty for an extra slice even if they are as short as 2 items.
对于列表,即使它们短至2个项目,额外切片也总是会有2倍的惩罚。 For strings, the results are extremely inconsistent, to the point that I can't say anything:
对于字符串,结果非常不一致,我不能说什么:
import timeit
for n in (2, 5, 10, 100, 1000, 10000, 100000, 1000000):
c = list(range(n))
# c = 'x' * n
pos = n // 2 # pretend the item was found in the middle
exprs = 'c[pos:][::-1]', 'c[:pos:-1] if pos else c[::-1]'
results = [timeit.Timer(expr, globals=globals()).autorange() for expr in exprs]
times = [t/loops for loops, t in results]
print(n, times[0]/times[1])
Results for lists (ratio of extra slice / no extra slice times): 列表的结果(额外切片的比率/没有额外的切片时间):
2 2.667782437753884
5 2.2672817613246914
10 1.4275235266754878
100 1.6167102119737584
1000 1.7309116253903338
10000 3.606259720606781
100000 2.636049703318956
1000000 1.9915776615090277
Of course, this ignores the fact that whatever it is we're doing with the resulting slice is much more costly, in relative terms, when the slice is short. 当然,这忽略了这样一个事实:无论我们对所得到的切片做什么,相对而言,当切片很短时,成本要高得多。 So still, I agree that for sequences of small size,
[::-1]
is usually perfectly fine. 所以,我同意对于小尺寸的序列,
[::-1]
通常都很好。
If an iterator result is okay, use a forward slice and call reversed
on it: 如果迭代器结果没问题,请使用前向切片并对其进行
reversed
调用:
reversed(seq[seq.index(whatever):])
If it isn't, subtract an extra len(seq)
from the endpoint: 如果不是,则从端点减去额外的
len(seq)
:
seq[:seq.index(whatever)-len(seq)-1:-1]
Or just take a forward slice, slice it again to reverse it, and eat the cost of the extra copy. 或者只是采取前向切片,再次切片以反转它,并吃掉额外副本的成本。 It's probably not your bottleneck.
这可能不是你的瓶颈。
Whatever you do, leave a comment explaining it so people don't reintroduce the bug when editing, and write a unit test for this case. 无论你做什么,留下评论解释它,以便人们不会在编辑时重新引入错误,并为此案例编写单元测试。
IMHO, seq[seq.index(x):][::-1]
is the most readable solution, but here's a way that's a little more efficient. 恕我直言,
seq[seq.index(x):][::-1]
是最易读的解决方案,但这里有一种更高效的方式。
def sliceback(seq, key):
pos = seq.index(key)
return seq[:pos-1 if pos else None:-1]
seq = 'abc'
for k in seq:
print(k, sliceback(seq, k))
output 产量
a cba
b cb
c c
As Budo Zindovic mentions in the comments, .index
will raise an exception if the char isn't found in the string. 正如Budo Zindovic在评论中提到的那样,如果在字符串中找不到字符,
.index
将引发异常。 Depending on the context, the code may not ever be called with a char that's not in seq
, but if it's possible we need to handle it. 根据上下文,可能永远不会使用不在
seq
的char调用代码,但如果可能,我们需要处理它。 The simplest way to do that is to catch the exception: 最简单的方法是捕获异常:
def sliceback(seq, key):
try:
pos = seq.index(key)
except ValueError:
return ''
return seq[:pos-1 if pos else None:-1]
seq = 'abc'
for k in 'abcd':
print(k, sliceback(seq, k))
output 产量
a cba
b cb
c c
d
Python exception handling is very efficient. Python异常处理非常有效。 When the exception isn't actually raised it's faster than equivalent
if
-based code, but if the exception is raised more than 5-10% of the time it's faster to use an if
. 当实际没有引发异常时,它比基于
if
的等效代码更快,但是如果异常提高超过5-10%的时间则使用if
更快。
Rather than testing for key
before calling seq.index
, it's more efficient to use find
. 而不是在调用
seq.index
之前测试key
,使用find
更有效。 Of course, that will only work if seq
is a string; 当然,只有当
seq
是一个字符串时才会有效; it won't work if seq
is a list because (annoyingly) lists don't have a .find
method. 如果
seq
是一个列表,它将无法工作,因为(令人讨厌的)列表没有.find
方法。
def sliceback(seq, key):
pos = seq.find(key)
return '' if pos < 0 else seq[:pos-1 if pos else None:-1]
You can check for pos
while assigning the string, for example: 您可以在分配字符串时检查
pos
,例如:
result = seq[-1:pos-1:-1] if pos > 0 else seq[::-1]
input: 输入:
pos = seq.index('a')
output: 输出:
cba
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.