简体   繁体   English

从列表末尾到特定索引的反向切片

[英]Reverse slice from end of list to a specific index

Let's say I need a slice from the end of a sequence seq to the first occurrence of a given item x (inclusive). 假设我需要从序列seq的末尾到给定项x (包括)的第一次出现的切片。 The naive attempt to write seq[-1:seq.index(x)-1:-1] creates a subtle bug: 写入seq[-1:seq.index(x)-1:-1]的天真尝试会产生一个微妙的错误:

seq = 'abc'
seq[-1:seq.index('b')-1:-1]  # 'cb' as expected
seq[-1:seq.index('a')-1:-1]  # '' because -1 is interpreted as end of seq

Is there any idiomatic way to write this? 有没有惯用的方法来写这个?

seq[seq.index(x):][::-1] works fine, but it is presumably inefficient for large sequences since it creates an extra copy. seq[seq.index(x):][::-1]工作正常,但对于大型序列来说可能效率低,因为它会创建一个额外的副本。 (I do need a sequence in the end, so one copy is necessary; I just don't want to create a second copy.) (我确实需要一个序列,所以需要一个副本;我只是不想创建第二个副本。)

On a side note, this is a really easy bug to introduce, it can pass many tests, and is undetectable to any static analyzer (unless it warns about every slice with a negative step). 另外,这是一个非常容易引入的错误,它可以通过许多测试,并且无法检测到任何静态分析器(除非它警告每个切片都有一个负步骤)。

Update 更新

There seems to be no perfect / idiomatic solution. 似乎没有完美/惯用的解决方案。 I agree that it may not be the bottleneck as often as I thought, so I'll use [pos:][::-1] in most cases. 我同意它可能不像我想象的那样经常出现瓶颈,因此在大多数情况下我会使用[pos:][::-1] When performance is important, I'd use the normal if check. 当性能很重要时,我会使用正常的if check。 However, I'll accept the solution that I found interesting even though it's hard to read; 但是,我会接受我觉得有趣的解决方案,即使它很难阅读; it's probably usable in certain rare cases (where I really need to fit the whole thing into an expression, and I don't want to define a new function). 它可能在某些极少数情况下可用(我真的需要将整个事物放入表达式中,我不想定义新函数)。

Also, I tried timing this. 另外,我试过计时。 For lists it seems there's always a 2x penalty for an extra slice even if they are as short as 2 items. 对于列表,即使它们短至2个项目,额外切片也总是会有2倍的惩罚。 For strings, the results are extremely inconsistent, to the point that I can't say anything: 对于字符串,结果非常不一致,我不能说什么:

import timeit
for n in (2, 5, 10, 100, 1000, 10000, 100000, 1000000):
    c = list(range(n))
    # c = 'x' * n
    pos = n // 2 # pretend the item was found in the middle
    exprs = 'c[pos:][::-1]', 'c[:pos:-1] if pos else c[::-1]'
    results = [timeit.Timer(expr, globals=globals()).autorange() for expr in exprs]
    times = [t/loops for loops, t in results]
    print(n, times[0]/times[1])

Results for lists (ratio of extra slice / no extra slice times): 列表的结果(额外切片的比率/没有额外的切片时间):

2 2.667782437753884
5 2.2672817613246914
10 1.4275235266754878
100 1.6167102119737584
1000 1.7309116253903338
10000 3.606259720606781
100000 2.636049703318956
1000000 1.9915776615090277

Of course, this ignores the fact that whatever it is we're doing with the resulting slice is much more costly, in relative terms, when the slice is short. 当然,这忽略了这样一个事实:无论我们对所得到的切片做什么,相对而言,当切片很短时,成本要高得多。 So still, I agree that for sequences of small size, [::-1] is usually perfectly fine. 所以,我同意对于小尺寸的序列, [::-1]通常都很好。

If an iterator result is okay, use a forward slice and call reversed on it: 如果迭代器结果没问题,请使用前向切片并对其进行reversed调用:

reversed(seq[seq.index(whatever):])

If it isn't, subtract an extra len(seq) from the endpoint: 如果不是,则从端点减去额外的len(seq)

seq[:seq.index(whatever)-len(seq)-1:-1]

Or just take a forward slice, slice it again to reverse it, and eat the cost of the extra copy. 或者只是采取前向切片,再次切片以反转它,并吃掉额外副本的成本。 It's probably not your bottleneck. 这可能不是你的瓶颈。

Whatever you do, leave a comment explaining it so people don't reintroduce the bug when editing, and write a unit test for this case. 无论你做什么,留下评论解释它,以便人们不会在编辑时重新引入错误,并为此案例编写单元测试。

IMHO, seq[seq.index(x):][::-1] is the most readable solution, but here's a way that's a little more efficient. 恕我直言, seq[seq.index(x):][::-1]是最易读的解决方案,但这里有一种更高效的方式。

def sliceback(seq, key):
    pos = seq.index(key)
    return seq[:pos-1 if pos else None:-1]

seq = 'abc'
for k in seq:
    print(k, sliceback(seq, k)) 

output 产量

a cba
b cb
c c

As Budo Zindovic mentions in the comments, .index will raise an exception if the char isn't found in the string. 正如Budo Zindovic在评论中提到的那样,如果在字符串中找不到字符, .index将引发异常。 Depending on the context, the code may not ever be called with a char that's not in seq , but if it's possible we need to handle it. 根据上下文,可能永远不会使用不在seq的char调用代码,但如果可能,我们需要处理它。 The simplest way to do that is to catch the exception: 最简单的方法是捕获异常:

def sliceback(seq, key):
    try:
        pos = seq.index(key)
    except ValueError:
        return ''
    return seq[:pos-1 if pos else None:-1]

seq = 'abc'
for k in 'abcd':
    print(k, sliceback(seq, k)) 

output 产量

a cba
b cb
c c
d 

Python exception handling is very efficient. Python异常处理非常有效。 When the exception isn't actually raised it's faster than equivalent if -based code, but if the exception is raised more than 5-10% of the time it's faster to use an if . 当实际没有引发异常时,它比基于if的等效代码更快,但是如果异常提高超过5-10%的时间则使用if更快。

Rather than testing for key before calling seq.index , it's more efficient to use find . 而不是在调用seq.index之前测试key ,使用find更有效。 Of course, that will only work if seq is a string; 当然,只有当seq是一个字符串时才会有效; it won't work if seq is a list because (annoyingly) lists don't have a .find method. 如果seq是一个列表,它将无法工作,因为(令人讨厌的)列表没有.find方法。

def sliceback(seq, key):
    pos = seq.find(key)
    return '' if pos < 0 else seq[:pos-1 if pos else None:-1]

You can check for pos while assigning the string, for example: 您可以在分配字符串时检查pos ,例如:

result = seq[-1:pos-1:-1] if pos > 0 else seq[::-1]

input: 输入:

pos = seq.index('a')

output: 输出:

cba

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM