简体   繁体   English

为什么可变字符串比不可变字符串慢?

[英]Why are mutable strings slower than immutable strings?

Why are mutable strings slower than immutable strings? 为什么可变字符串比不可变字符串慢?

EDIT: 编辑:

>>> import UserString
... def test():
...     s = UserString.MutableString('Python')
...     for i in range(3):
...         s[0] = 'a'
... 
... if __name__=='__main__':
...     from timeit import Timer
...     t = Timer("test()", "from __main__ import test")
...     print t.timeit()
13.5236170292



>>> import UserString
... def test():
...     s = UserString.MutableString('Python')
...     s = 'abcd'
...     for i in range(3):
...         s = 'a' + s[1:]
... 
... if __name__=='__main__':
...     from timeit import Timer
...     t = Timer("test()", "from __main__ import test")
...     print t.timeit()
6.24725079536


>>> import UserString
... def test():
...     s = UserString.MutableString('Python')
...     for i in range(3):
...         s = 'a' + s[1:]
... 
... if __name__=='__main__':
...     from timeit import Timer
...     t = Timer("test()", "from __main__ import test")
...     print t.timeit()
38.6385951042

i think it is obvious why i put s = UserString.MutableString('Python') on second test. 我认为很明显为什么我在第二次测试中放入s = UserString.MutableString('Python')。

In a hypothetical language that offers both mutable and immutable, otherwise equivalent, string types (I can't really think of one offhand -- eg, Python and Java both have immutable strings only, and other ways to make one through mutation which add indirectness and therefore can of course slow things down a bit;-), there's no real reason for any performance difference -- for example, in C++, interchangeably using a std::string or a const std::string I would expect to cause no performance difference (admittedly a compiler might be able to optimize code using the latter better by counting on the immutability, but I don't know any real-world ones that do perform such theoretically possible optimizations;-). 在一种假设的语言中,它提供了可变的和不可变的,或者是等价的字符串类型(我不能真正想到一个 - 例如,Python和Java都只有不可变的字符串,以及其他通过突变来增加间接性的方法因此当然可以减慢一些事情;-),没有任何性能差异的真正原因 - 例如,在C ++中,可互换地使用std::stringconst std::string我会期望不会性能差异(当然编译器可能能够使用在不变性计数后者更好地优化代码,但我不知道任何真实世界的那些执行这样理论上是可行的优化;-)。

Having immutable strings may and does in fact allow very substantial optimizations in Java and Python. 拥有不可变字符串可能并确实允许在Java和Python中进行非常大量的优化。 For example, if the strings get hashed, the hash can be cached, and will never have to be recomputed (since the string can't change) -- that's especially important in Python, which uses hashed strings (for look-ups in sets and dictionaries) so lavishly and even "behind the scenes". 例如,如果字符串被散列,则哈希值可以被缓存,并且永远不必重新计算(因为字符串不能更改) - 这在使用散列字符串的Python中尤其重要(对于集合中的查找)和词典)如此奢华甚至“幕后”。 Fresh copies never need to be made "just in case" the previous one has changed in the meantime -- references to a single copy can always be handed out systematically whenever that string is required. 新鲜的副本永远不需要“以防万一”,前一个副本在此期间发生了变化 - 只要需要该字符串,就可以系统地分发对单个副本的引用。 Python also copiously uses "interning" of (some) strings, potentially allowing constant-time comparisons and many other similarly fast operations -- think of it as one more way, a more advanced one to be sure, to take advantage of strings' immutability to cache more of the results of operations often performed on them. Python还大量使用(某些)字符串的“实习”,可能允许进行常数时间比较和许多其他类似的快速操作 - 将其视为一种更多方式,一种更先进的方法,以利用字符串的不变性缓存更多经常对它们执行的操作的结果。

That's not to say that a given compiler is going to take advantage of all possible optimizations, of course. 当然,这并不是说给定的编译器会利用所有可能的优化。 For example, when a slice of a string is requested, there is no real need to make a new object and copy the data over -- the new slice might refer to the old one with an offset (and an independently stored length), potentially a great optimization for big strings out of which many slices are taken. 例如,当请求一个字符串切片时,不需要创建一个新对象并复制数据 - 新切片可能引用具有偏移量(以及独立存储长度)的旧切片,可能对于大字符串的一个很好的优化,其中采取了许多切片。 Python doesn't do that because, unless particular care is taken in memory management, this might easily result in the "big" string being all kept in memory when only a small slice of it is actually needed -- but it's a tradeoff that a different implementation might definitely choose to perform ( with that burden of extra memory management, to be sure -- more complex, harder-to-debug compiler and runtime code for the hypothetical language in question). Python没有这样 ,因为除非在内存管理中特别小心,否则这可能很容易导致“大”字符串在实际只需要一小部分时保留在内存中 - 但这是一个权衡不同的实现可能肯定会选择执行( 具有额外内存管理的负担,确保 - 对于所讨论的假设语言更复杂,更难以调试的编译器和运行时代码)。

I'm just scratching the surface here -- and many of these advantages would be hard to keep if otherwise interchangeable string types could exist in both mutable and immutable versions (which I suspect is why, to the best of my current knowledge at least, C++ compilers actually don't bother with such optimizations, despite being generally very performance-conscious). 我只是在这里摸索 - 如果在可变和不可变版本中可以存在可互换的字符串类型,那么很多这些优点将很难保持(我怀疑是为什么,至少就我目前的知识而言, C ++编译器实际上并不打扰这种优化,尽管通常非常注重性能)。 But by offering only immutable strings as the primitive, fundamental data type (and thus implicitly accepting some disadvantage when you'd really need a mutable one;-), languages such as Java and Python can clearly gain all sorts of advantages -- performance issues being only one group of them (Python's choice to allow only immutable primitive types to be hashable, for example, is not a performance-centered design decision -- it's more about clarity and predictability of behavior for sets and dictionaries!-). 但是通过提供不可变的字符串作为原始的基本数据类型(因此当你真的需要一个可变的字符时隐含地接受一些缺点;-),诸如Java和Python之类的语言可以明显地获得各种优势 - 性能问题只是其中的一组(例如,Python只允许不可变基元类型可选择的选择不是以性能为中心的设计决策 - 它更多地是关于集合和字典行为的清晰度和可预测性! - )。

I don't know if they are really a lot slower but they make thinking about programming easier a lot of the times, because the state of the object/string can't change. 我不知道它们是否真的慢得多,但是他们在很多时候都会考虑编程更容易,因为对象/字符串的状态不能改变。 That's the most important property to immutability to me. 这是对我来说不变的最重要的财产。

Furthermore you might assume that immutable string are faster because they have less state(which can change), which might mean lower memory consumption, CPU-cycles. 此外,您可能会认为不可变字符串更快,因为它们具有较少的状态(可以更改),这可能意味着较低的内存消耗,CPU周期。

I also found this interesting article while googling which I would like to quote: 我在google搜索时也发现了这篇有趣的文章,我想引用一下:

knowing that a string is immutable makes it easy to lay it out at construction time — fixed and unchanging storage requirements 知道字符串是不可变的,这使得在构建时很容易将其布局 - 固定且不变的存储要求

with an immutable string, python can intern it and refer to it internally by it's address in memory. 使用不可变的字符串,python可以实习它并在内部通过它在内存中的地址引用它。 This means that to compare two strings, it only has to compare their addresses in memory (unless one of them isn't interned). 这意味着要比较两个字符串,它只需要比较它们在内存中的地址(除非其中一个没有实现)。 Also, keep in mind that not all strings are interned. 另外,请记住并非所有字符串都被实现。 I've seen example of constructed strings that are not interned. 我已经看到了没有实习的构造字符串的例子。

with mutable strings, string comparison would involve comparing them character by character and would also require either storing identical strings in different locations (malloc is not free) or adding logic to keep track of how many times a given string is referred to and making a copy for every mutation if there were more than one referrer. 对于可变字符串,字符串比较将涉及逐个字符地比较它们,并且还需要在不同位置存储相同的字符串(malloc不是空闲的)或添加逻辑以跟踪给定字符串被引用的次数并制作副本对于每个突变,如果有多个推荐人。

It seems like python is optimized for string comparison. 似乎python已经针对字符串比较进行了优化。 This makes sense because even string manipulation involves string comparison in most cases so for most use cases, it's the lowest common denominator. 这是有道理的,因为即使字符串操作在大多数情况下也涉及字符串比较,因此对于大多数用例来说,它是最低的公分母。

Another advantage of immutable strings is that it makes it possible for them to be hashable which is a requirement for using them for dictionary keys. 不可变字符串的另一个优点是它使它们可以是可散列的,这是将它们用于字典键的要求。 imagine a scenario where they were mutable: 想象一下它们是可变的场景:

s = 'a'
d = {s : 1}
s = s + 'b'
d[s] = ?

I suppose python could keep track of which dicts have which strings as keys and update all of their hashtables when a string was modified but that's just adding more overhead to dict insertion. 我想python可以跟踪哪些字符串有哪些字符串作为键,并在修改字符串时更新所有哈希表,但这只是增加了dict插入的开销。 It's not to far off the mark to say that you can't do anything in python without a dict insertion/lookup so that would be very very bad. 如果没有dict插入/查找,你就不能在python中做任何事情 ,这是非常不合适的,所以这将是非常非常糟糕的。 It also adds overhead to string manipulation. 它还增加了字符串操作的开销。

The obvious answer to your question is that normal strings are implemented in C, while MutableString is implemented in Python. 你问题的明显答案是普通字符串是用C实现的,而MutableString是用Python实现的。

Not only does every operation on a mutable string have the overhead of going through one or more Python function calls, but the implementation is essentially a wrapper round an immutable string - when you modify the string it creates a new immutable string and throws the old one away. 不仅可变字符串上的每个操作都有通过一个或多个Python函数调用的开销,但实现本质上是一个不可变字符串的包装器 - 当你修改字符串时,它创建一个新的不可变字符串并抛出旧字符串远。 You can read the source in the UserString.py file in your Python lib directory. 您可以在Python lib目录中的UserString.py文件中读取源代码。

To quote the Python docs: 引用Python文档:

Note: 注意:

This UserString class from this module is available for backward compatibility only. 此模块中的此UserString类仅可用于向后兼容。 If you are writing code that does not need to work with versions of Python earlier than Python 2.2, please consider subclassing directly from the built-in str type instead of using UserString (there is no built-in equivalent to MutableString). 如果您编写的代码不需要使用早于Python 2.2的Python版本,请考虑直接从内置str类型继承子类而不是使用UserString(没有内置的等效于MutableString)。

This module defines a class that acts as a wrapper around string objects. 该模块定义了一个类,它充当字符串对象的包装器。 It is a useful base class for your own string-like classes, which can inherit from them and override existing methods or add new ones. 它是您自己的类字符串类的有用基类,它可以从它们继承并覆盖现有方法或添加新方法。 In this way one can add new behaviors to strings. 通过这种方式,可以向字符串添加新行为。

It should be noted that these classes are highly inefficient compared to real string or Unicode objects; 应该注意的是,与真正的字符串或Unicode对象相比,这些类的效率非常低; this is especially the case for MutableString. 对于MutableString尤其如此。

(Emphasis added). (重点补充)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM