简体   繁体   English

按字符和元素的切片列表,Python

[英]Slicing list by characters and elements, Python

I have a text column of limited width, and each row is a list of multiple elements, delimited by by semicolons. 我有一个宽度有限的文本列,并且每一行都是由分号分隔的多个元素的列表。 I would like to remove all list elements that cause the row to pass the character limit. 我想删除所有导致该行通过字符数限制的列表元素。

Previously, I was using 以前我在用

   if len(row[7].split(';')) > 5:
        row[7] = ('; '.join(row[7].split(';')[1:5]).strip())[:45]

This creates two obvious problems: 这产生了两个明显的问题:

  • Some lists have fewer than 5 elements and more than 45 characters, so the conditional does not delete the extra elements like it should 有些列表的元素少于5个,字符数超过45个,因此条件列表不会像应删除的那样删除多余的元素
  • List elements get cut off mid-word. 列表元素被切断中间词。

This is an example input: 这是一个示例输入:

 Foo; Bar; Aoicsdeadwcwewrw; owierwicowmwoemow; aoweirwoer
 ODIFUWE
 acowierwe; asodicjwoer; s; ow; w; w

This is the corresponding example output: 这是相应的示例输出:

 Foo; Bar; Aoicsdeadwcwewrw
 ODIFUWE
 acowierwe; asodicjwoer; s; ow; w

The limit is 5 elements or 45 characters, and if the line reaches either of these limits the trailing elements should be cut off. 限制为5个元素或45个字符,如果行达到这些限制中的任何一个,则尾随的元素应被切除。

I think this generator is the most efficient way to determine where to cut your list of strings: 我认为此生成器是确定在哪里剪切字符串列表的最有效方法:

def limit(iterable, max_num, max_length, padding_length):
    seen_length = -padding_length  # the first value will not be padded so start negative
    for i, s in enumerate(iterable, 1):
        if i > max_num or seen_length + padding_length + len(s) > max_length:
            return
        seen_length += padding_length + len(s)
        yield s

Use it like this: 像这样使用它:

row[7] = "; ".join(limit(row[7].split("; "), 5, 45, 2)

The generator doesn't join any strings, just adds their lengths together, so using it and one join will be O(N+M) where N is the number of strings and M is the length of the result string. 生成器不连接任何字符串,只是将它们的长度加在一起,因此使用它,一个join将是O(N+M) ,其中N是字符串数, M是结果字符串的长度。 This is better than gnibbler 's solution, which is O(N*M) due to repeated join s. 这是优于gnibbler的解决方案,这是O(N*M)由于反复join秒。 This algorithmic improvement probably doesn't matter much for relatively short and few strings, like you describe, but if you were trying to limit things to say, 500 items and a length of thousands of characters, you'd probably notice the difference. 对于您所描述的相对较短和较少的字符串,这种算法上的改进可能并不重要,但是如果您试图限制要说的东西(500个项目和数千个字符的长度),则可能会注意到其中的区别。

>>> data = """ Foo; Bar; Aoicsdeadwcwewrw; owierwicowmwoemow; aoweirwoer
...  ODIFUWE
...  acowierwe; asodicjwoer; s; ow; w; w""".split("\n")
>>> 
>>> for row in data:
...     row = row.split(";")[:5]
...     res = []
...     for item in row:
...         if len(";".join(res + [item])) > 45: break
...         res.append(item)
...     print ";".join(res)
... 
 Foo; Bar; Aoicsdeadwcwewrw
 ODIFUWE
 acowierwe; asodicjwoer; s; ow; w

Here is a functional breakdown, which should make it more obvious what is going on: 这是功能细分,应该使发生的事情更明显:

data = [
    " Foo; Bar; Aoicsdeadwcwewrw; owierwicowmwoemow; aoweirwoer",
    " ODIFUWE",
    " acowierwe; asodicjwoer; s; ow; w; w"
]

def first_n_chars(s, break_on, n):
    if len(s) > n:
        return s[:s.rfind(break_on, 0, n + len(break_on))]
    else:
        return s

def first_n_groups(s, break_on, n):
    try:
        end = -1
        for _ in range(n):
            end = s.index(break_on, end+1)
        return s[:end]
    except ValueError:
        return s

fortyfivechars = (first_n_chars (s, '; ', 45) for s in data)
fivegroups     = (first_n_groups(s, '; ', 5)  for s in fortyfivechars)
trimmed_data   = list(fivegroups)

which results in 导致

[' Foo; Bar; Aoicsdeadwcwewrw',
 ' ODIFUWE',
 ' acowierwe; asodicjwoer; s; ow; w']
def myfilter(x, wmax=5, cmax=45, d=';'):
    words = x.split(d)

    nwords = 0
    nchars = 0
    s = []
    for i in words:
        nwords += 1
        nchars += len(i) + len(d)
        if (nwords >= wmax) | (nchars > cmax+1):
            break
        s.append(i)
    return ';'.join(s)

Something like this should work. 这样的事情应该起作用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM