简体   繁体   English

在python中,什么更有效? 修改列表或字符串?

[英]In python, what is more efficient? Modifying lists or strings?

Regardless of ease of use, which is more computationally efficient? 不管易用性,哪个计算效率更高? Constantly slicing lists and appending to them? 不断切片列表并追加到列表中? Or taking substrings and doing the same? 还是采用子字符串并执行相同的操作?

As an example, let's say I have two binary strings "11011" and "01001". 例如,假设我有两个二进制字符串“ 11011”和“ 01001”。 If I represent these as lists, I'll be choosing a random "slice" point. 如果将它们表示为列表,则将选择一个随机的“切片”点。 Let's say I get 3. I'll Take the first 3 characters of the first string and the remaining characters of the second string (so I'd have to slice both) and create a new string out of it. 假设我得到3。我将取第一个字符串的前3个字符和第二个字符串的其余字符(因此,我必须将它们都切成薄片)并从中创建一个新字符串。

Would this be more efficiently done by cutting the substrings or by representing it as a list ( [1, 1, 0, 1, 1] ) rather than a string? 通过剪切子字符串或将其表示为列表([1,1,0,1,1])而不是字符串,会更有效吗?

>>> a = "11011"
>>> b = "01001"
>>> import timeit
>>> def strslice():
    return a[:3] + b[3:]

>>> def lstslice():
    return list(a)[:3] + list(b)[3:]
>>> c = list(a)
>>> d = list(b)
>>> def lsts():
    return c[:3] + d[3:]

>>> timeit.timeit(strslice)
0.5103488475836432
>>> timeit.timeit(lstslice)
2.4350100538824613
>>> timeit.timeit(lsts)
1.0648406858527295

timeit is a good tool for micro-benchmarking, but it needs to be used with the utmost care when the operations you want to compare may involve in-place alterations -- in this case, you need to include extra operations designed to make needed copies. timeit是进行微基准测试的好工具,但是当您要比较的操作可能涉及就地更改时,需要格外小心地使用它-在这种情况下,您需要包括旨在制作所需副本的额外操作。 Then, first time just the "extra" overhead: 然后,第一次只是“额外”开销:

$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b)'
100000 loops, best of 3: 5.01 usec per loop
$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b)'
100000 loops, best of 3: 5.06 usec per loop

So making the two brand-new lists we need (to avoid alteration) costs a tad more than 5 microseconds (when focused on small differences, run things at least 2-3 times to eyeball the uncertainty range). 因此,制作我们需要的两个全新列表(避免更改)要花费超过5微秒的时间(当关注小的差异时,运行至少2-3次以检查不确定性范围)。 After which: 之后:

$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b);x=a[:3]+b[3:]'
100000 loops, best of 3: 5.5 usec per loop
$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b);x=a[:3]+b[3:]'
100000 loops, best of 3: 5.47 usec per loop

string slicing and concatenation in this case can be seen to cost another 410-490 nanoseconds. 在这种情况下,字符串切片和串联可以再花费410-490纳秒。 And: 和:

$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b);la[3:]=lb[3:]'
100000 loops, best of 3: 5.99 usec per loop
$ python -mtimeit -s'a="11011";b="01001"' 'la=list(a);lb=list(b);la[3:]=lb[3:]'
100000 loops, best of 3: 5.99 usec per loop

in-place list splicing can be seen to cost 930-980 nanoseconds. 就地列表拼接可以看到花费930-980纳秒。 The difference is safely above the noise/uncertainty levels, so you can reliably state that for this use case working with strings is going to take roughly half as much time as working in-place with lists. 该差异可以安全地超过噪音/不确定性级别,因此您可以可靠地声明,在此用例中,使用字符串处理所需的时间大约是就地处理列表所需的时间的一半。 Of course, it's also crucial to measure a range of use cases that are relevant and representative of your typical bottleneck tasks! 当然,衡量与典型瓶颈任务相关且代表典型的一系列用例也至关重要!

通常,修改列表比修改字符串更有效,因为字符串是不可变的。

It really depends on actual use cases, and as others have said, profile it, but in general, appending to lists will be better, because it can be done in place, whereas "appending to strings" actually creates a new string that concatenates the old strings. 它确实取决于实际用例,并且正如其他人所说,对它进行概要分析,但是总的来说,追加到列表会更好,因为可以在适当的位置完成,而“追加到字符串”实际上创建了一个新字符串,将旧弦。 This can rapidly eat up memory. 这样会迅速耗尽内存。 (Which is a different issue from computational efficiency, really). (实际上,这是与计算效率不同的问题)。

Edit: If you want computational efficiency with binary values, don't use strings or lists. 编辑:如果要使用二进制值来提高计算效率,请不要使用字符串或列表。 Use integers and bitwise operations. 使用整数和按位运算。 With recent versions of python, you can use binary representations when you need them: 使用最新版本的python,可以在需要它们时使用二进制表示形式:

>>> bin(42)
'0b101010'
>>> 0b101010
42
>>> int('101010')
101010
>>> int('101010', 2)
42
>>> int('0b101010')
...
ValueError: invalid literal for int() with base 10: '0b101010'
>>> int('0b101010', 2)
42

Edit 2: 编辑2:

def strslice(a, b):
    return a[:3] + b[3:]

might be better written something like: 可能会写得更好:

def binspice(a, b):
    mask = 0b11100
    return (a & mask) + (b & ~mask)

>>> a = 0b11011
>>> b = 0b1001
>>> bin(binsplice(a, b))
'0b11001
>>> 

It might need to be modified if your binary numbers are different sizes. 如果您的二进制数大小不同,则可能需要对其进行修改。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM