简体   繁体   English

Python拆分列表(如果找到数字序列)

[英]Python split list if sequence of numbers is found

I've been trying to find a relevant question, though I can't seem to search for the right words and all I'm finding is how to check if a list contains an intersection. 我一直在寻找一个相关的问题,尽管我似乎无法搜索正确的单词,而我所发现的只是如何检查列表中是否包含交集。

Basically, I need to split a list once a certain sequence of numbers is found, similar to doing str.split(sequence)[0] , but with lists instead. 基本上,一旦找到一定数量的数字序列,我就需要拆分列表,类似于执行str.split(sequence)[0] ,但是要使用列表。 I have working code, though it doesn't seem very efficient (also no idea if raising an error was the right way to go about it), and I'm sure there must be a better way to do it. 我有有效的代码,尽管它似乎不是很有效(也不知道引发错误是否是解决问题的正确方法),而且我敢肯定必须有更好的方法。

For the record, long_list could potentially have a length of a few million values, which is why I think iterating through them all might not be the best idea. 记录下来, long_list可能具有几百万个值的长度,这就是为什么我认为遍历它们可能不是最好的主意。

long_list = [2,6,4,2,7,98,32,5,15,4,2,6,43,23,95,10,31,5,1,73]
end_marker = [6,43,23,95]
end_marker_len = len(end_marker)

class SuccessfulTruncate(Exception):
    pass
try:
    counter = 0
    for i in range(len(long_list)):
        if long_list[i] == end_marker[counter]:
            counter += 1
        else:
            counter = 0
        if counter == end_marker_len:
            raise SuccessfulTruncate()
except SuccessfulTruncate:
    long_list = long_list[:2 + i - end_marker_len]
else:
    raise IndexError('sequence not found')

>>> long_list
[2,6,4,2,7,98,32,5,15,4,2]

Ok, timing a few answers with a big list of 1 million values (the marker is very near the end): 好的,用一百万个值的大列表来计时几个答案(标记非常接近结尾):

Tim: 3.55 seconds
Mine: 2.7 seconds
Dan: 0.55 seconds
Andrey: 0.28 seconds
Kasramvd: still executing :P

I have working code, though it doesn't seem very efficient (also no idea if raising an error was the right way to go about it), and I'm sure there must be a better way to do it. 我有有效的代码,尽管它似乎不是很有效(也不知道引发错误是否是解决问题的正确方法),而且我敢肯定必须有更好的方法。

I commented on the exception raising in my comment 我在评论中评论了引发异常的情况

Instead of raising an exception and catching it in the same try/except you can just omit the try/except and do if counter == end_marker_len: long_list = long_list[:2 + i - end_marker_len] . 无需引发异常并在相同的try / except中捕获它,您可以省略try / except并执行if counter == end_marker_len: long_list = long_list[:2 + i - end_marker_len] Successful is not a word thats fitting for an exception name. 成功不是适合异常名称的单词。 Exceptions are used to indicate that something failed 异常用于指示某些失败

Anyway, here is a shorter way: 无论如何,这是一种较短的方法:

>>> long_list = [2,6,4,2,7,98,32,5,15,4,2,6,43,23,95,10,31,5,1,73]
>>> end_marker = [6,43,23,95]
>>> index = [i for i in range(len(long_list)) if long_list[i:i+len(end_marker)] == end_marker][0]
>>> long_list[:index]
[2, 6, 4, 2, 7, 98, 32, 5, 15, 4, 2]

List comprehension inspired by this post 列表理解受本文启发

As a more pythonic way instead of multiple slicing you can use itertools.islice within a list comprehension : 作为一种更pythonic的方法,而不是多重切片,您可以在列表itertools.islice中使用itertools.islice

>>> from itertools import islice
>>> M,N=len(long_list),len(end_maker)
>>> long_list[:next((i for i in range(0,M) if list(islice(long_list,i,i+N))==end_marker),0)]
[2, 6, 4, 2, 7, 98, 32, 5, 15, 4, 2]

Note that since the default value of next function is 0 if it doesn't find any match it will returns the whole of long_list . 请注意,由于next函数的默认值为0(如果找不到任何匹配项),因此它将返回整个long_list

If the values are of limited range, say fit in bytes (this can also be adapted to larger types), why not then encode the lists so that the string method find could be used: 如果值的范围有限,比如说适合字节(这也可以适应较大的类型),那么为什么不对列表进行编码,以便可以使用字符串方法find

long_list = [2,6,4,2,7,98,32,5,15,4,2,6,43,23,95,10,31,5,1,73]
end_marker = [6,43,23,95]

import struct

long_list_p = struct.pack('B'*len(long_list), *long_list)
end_marker_p = struct.pack('B'*len(end_marker), *end_marker)

print long_list[:long_list_p.find(end_marker_p)]

Prints: 印刷品:

[2, 6, 4, 2, 7, 98, 32, 5, 15, 4, 2]

I tried using bytes as in but the find method they had didn't work: 我尝试使用in bytes ,但是他们无法使用的find方法:

print long_list[:bytes(long_list).find(bytes(end_marker))]

In my solution used approach with index method: 在我的解决方案中使用index方法的方法:

input = [2,6,4,2,7,98,32,5,15,4,2,6,43,23,95,10,31,5,1,73]
brk = [6,43,23,95]
brk_len = len(brk)
brk_idx = 0
brk_offset = brk_idx + brk_len

try:
    while input[brk_idx:brk_offset] != brk:
        brk_idx = input.index(brk[0], brk_idx + 1)
        brk_offset = brk_idx + brk_len
except ValueError: 
    print("Not found")
else:
    print(input[:brk_idx])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM