[英]Slicing list by characters and elements, Python
I have a text column of limited width, and each row is a list of multiple elements, delimited by by semicolons. 我有一个宽度有限的文本列,并且每一行都是由分号分隔的多个元素的列表。 I would like to remove all list elements that cause the row to pass the character limit. 我想删除所有导致该行通过字符数限制的列表元素。
Previously, I was using 以前我在用
if len(row[7].split(';')) > 5:
row[7] = ('; '.join(row[7].split(';')[1:5]).strip())[:45]
This creates two obvious problems: 这产生了两个明显的问题:
This is an example input: 这是一个示例输入:
Foo; Bar; Aoicsdeadwcwewrw; owierwicowmwoemow; aoweirwoer
ODIFUWE
acowierwe; asodicjwoer; s; ow; w; w
This is the corresponding example output: 这是相应的示例输出:
Foo; Bar; Aoicsdeadwcwewrw
ODIFUWE
acowierwe; asodicjwoer; s; ow; w
The limit is 5 elements or 45 characters, and if the line reaches either of these limits the trailing elements should be cut off. 限制为5个元素或45个字符,如果行达到这些限制中的任何一个,则尾随的元素应被切除。
I think this generator is the most efficient way to determine where to cut your list of strings: 我认为此生成器是确定在哪里剪切字符串列表的最有效方法:
def limit(iterable, max_num, max_length, padding_length):
seen_length = -padding_length # the first value will not be padded so start negative
for i, s in enumerate(iterable, 1):
if i > max_num or seen_length + padding_length + len(s) > max_length:
return
seen_length += padding_length + len(s)
yield s
Use it like this: 像这样使用它:
row[7] = "; ".join(limit(row[7].split("; "), 5, 45, 2)
The generator doesn't join any strings, just adds their lengths together, so using it and one join
will be O(N+M)
where N
is the number of strings and M
is the length of the result string. 生成器不连接任何字符串,只是将它们的长度加在一起,因此使用它,一个join
将是O(N+M)
,其中N
是字符串数, M
是结果字符串的长度。 This is better than gnibbler
's solution, which is O(N*M)
due to repeated join
s. 这是优于gnibbler
的解决方案,这是O(N*M)
由于反复join
秒。 This algorithmic improvement probably doesn't matter much for relatively short and few strings, like you describe, but if you were trying to limit things to say, 500 items and a length of thousands of characters, you'd probably notice the difference. 对于您所描述的相对较短和较少的字符串,这种算法上的改进可能并不重要,但是如果您试图限制要说的东西(500个项目和数千个字符的长度),则可能会注意到其中的区别。
>>> data = """ Foo; Bar; Aoicsdeadwcwewrw; owierwicowmwoemow; aoweirwoer
... ODIFUWE
... acowierwe; asodicjwoer; s; ow; w; w""".split("\n")
>>>
>>> for row in data:
... row = row.split(";")[:5]
... res = []
... for item in row:
... if len(";".join(res + [item])) > 45: break
... res.append(item)
... print ";".join(res)
...
Foo; Bar; Aoicsdeadwcwewrw
ODIFUWE
acowierwe; asodicjwoer; s; ow; w
Here is a functional breakdown, which should make it more obvious what is going on: 这是功能细分,应该使发生的事情更明显:
data = [
" Foo; Bar; Aoicsdeadwcwewrw; owierwicowmwoemow; aoweirwoer",
" ODIFUWE",
" acowierwe; asodicjwoer; s; ow; w; w"
]
def first_n_chars(s, break_on, n):
if len(s) > n:
return s[:s.rfind(break_on, 0, n + len(break_on))]
else:
return s
def first_n_groups(s, break_on, n):
try:
end = -1
for _ in range(n):
end = s.index(break_on, end+1)
return s[:end]
except ValueError:
return s
fortyfivechars = (first_n_chars (s, '; ', 45) for s in data)
fivegroups = (first_n_groups(s, '; ', 5) for s in fortyfivechars)
trimmed_data = list(fivegroups)
which results in 导致
[' Foo; Bar; Aoicsdeadwcwewrw',
' ODIFUWE',
' acowierwe; asodicjwoer; s; ow; w']
def myfilter(x, wmax=5, cmax=45, d=';'):
words = x.split(d)
nwords = 0
nchars = 0
s = []
for i in words:
nwords += 1
nchars += len(i) + len(d)
if (nwords >= wmax) | (nchars > cmax+1):
break
s.append(i)
return ';'.join(s)
Something like this should work. 这样的事情应该起作用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.