简体   繁体   English

Python:具有唯一ID的相同字符串(或数字)吗?

[英]Python: Identical strings (or numbers) with unique ids?

Python is wonderfully optimized, but I have a case where I'd like to work around it. Python进行了出色的优化,但是我有一个需要解决的案例。 It seems for small numbers and strings, python will automatically collapse multiple objects into one. 似乎对于较小的数字和字符串,python会自动将多个对象折叠为一个。 For example: 例如:

>>> a = 1
>>> b = 1
>>> id(a) == id(b)
True
>>> a = str(a)
>>> b = str(b)
>>> id(a) == id(b)
True
>>> a += 'foobar'
>>> b += 'foobar'
>>> id(a) == id(b)
False
>>> a = a[:-6]
>>> b = b[:-6]
>>> id(a) == id(b)
True

I have a case where I'm comparing objects based on their Python ids. 我有一种情况,我正在根据对象的Python ID比较对象。 This is working really well except for the few cases where I run into small numbers. 除少数情况下我碰到小数目外,这确实工作得很好。 Does anyone know how to turn off this optimization for specific strings and integers? 有谁知道如何针对特定的字符串和整数关闭此优化? Something akin to an anti-intern()? 类似于anti-intern()吗?

You can't turn it off without re-compiling your own version of CPython. 您必须先重新编译自己的CPython版本才能关闭它。

But if you want to have "separate" versions of the same small integers, you can do that by maintaining your own id (for example a uuid4 ) associated with the object. 但是,如果您希望具有相同的小整数的“分离”版本,则可以通过维护与对象关联的自己的ID(例如uuid4 )来实现。

Since ints and strings are immutable, there's no obvious reason to do this - if you can't modify the object at all, you shouldn't care whether you have the "original" or a copy because there is no use-case where it can make any difference. 由于int和字符串是不可变的,因此没有明显的理由执行此操作-如果您根本无法修改对象,则无需关心是否拥有“原始”或副本,因为没有用例可以有所作为。

Related: How to create the int 1 at two different memory locations? 相关: 如何在两个不同的内存位置创建int 1?

Sure, it can be done, but its never really a good idea: 当然可以 ,但是它从来不是一个好主意:

# 
Z =1

class MyString(string):
    def __init__(self, *args):
        global Z
        super(MyString, 
                  self).__init__(*args)
        self.i = Z
        Z += 1

>>> a = MyString("1")
>>> b = MyString("1")
>>> a is b
False

btw, to compare if objects have the same id just use a is b instead of id(a)==id(b) btw,要比较对象是否具有相同的id只需使用a is b而不是id(a)==id(b)

The Python documentation on id() says id()Python文档

Return the “identity” of an object. 返回对象的“身份”。 This is an integer which is guaranteed to be unique and constant for this object during its lifetime. 这是一个整数,可以保证在此对象的生存期内唯一且恒定。 Two objects with non-overlapping lifetimes may have the same id() value. 具有不重叠生存期的两个对象可能具有相同的id()值。

CPython implementation detail: This is the address of the object in memory. CPython实现细节:这是对象在内存中的地址。

So it's guaranteed to be unique, it must be intended as a way to tell if two variables are bound to the same object. 因此,它保证是唯一的,它必须用作判断两个变量是否绑定到同一对象的一种方式。

In a comment on StackOverflow here , Alex Martelli says the CPython implementation is not the authoritative Python, and other correct implementations of Python can and do behave differently in some ways - and that the Python Language Reference (PLR) is the closest thing Python has to a definitive specification. 这里对StackOverflow的评论中 ,Alex Martelli说CPython实现不是权威的Python,Python的其他正确实现可以并且确实在某些方面表现不同-并且Python Language Reference (PLR)是Python必须达到的最接近的功能明确的规范。

In the PLR section on objects it says much the same: 有关对象的PLR部分中,其内容大致相同:

Every object has an identity, a type and a value. 每个对象都有一个标识,一个类型和一个值。 An object's identity never changes once it has been created; 一旦创建了对象,其身份就不会改变。 you may think of it as the object's address in memory. 您可能会认为它是对象在内存中的地址。 The 'is' operator compares the identity of two objects; “ is”运算符比较两个对象的身份; the id() function returns an integer representing its identity (currently implemented as its address). id()函数返回一个表示其身份的整数(当前实现为其地址)。

The language reference doesn't say it's guaranteed to be unique. 语言参考并没有说它保证是唯一的。 It also says (re: the object's lifetime): 它还说(关于对象的生存期):

Objects are never explicitly destroyed; 对象永远不会被明确销毁; however, when they become unreachable they may be garbage-collected. 但是,当它们变得不可访问时,它们可能会被垃圾回收。 An implementation is allowed to postpone garbage collection or omit it altogether — it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable. 允许某个实现推迟或完全取消垃圾回收-只要没有收集仍可到达的对象,垃圾回收的实现方式就取决于实现质量。

and: 和:

CPython implementation detail: CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references. CPython实现细节:CPython当前使用引用计数方案,该方案具有(可选)延迟检测循环链接的垃圾的功能,该方法可在大多数对象变得不可达时立即收集它们,但不能保证收集包含循环引用的垃圾。 See the documentation of the gc module for information on controlling the collection of cyclic garbage. 有关控制循环垃圾收集的信息,请参见gc模块的文档。 Other implementations act differently and CPython may change. 其他实现的行为有所不同,CPython可能会更改。 Do not depend on immediate finalization of objects when they become unreachable (ex: always close files). 当对象变得不可访问时,不要依赖它们的立即完成(例如:始终关闭文件)。

This isn't actually an answer, I was hoping this would end up somewhere conclusive. 这实际上不是答案,我希望这最终可以得出结论。 But I don't want to delete it now I've quoted and cited. 但是我现在不想删除它,因为我已经引用和引用了。

I'll go with turning your premise around: python will automatically collapse multiple objects into one. 我将扭转局面: python will automatically collapse multiple objects into one. - no it willn't, they were never multiple objects, they can't be, because they have the same id() . -不,不会,因为它们从来没有多个对象,所以不能,因为它们具有相同的id()

If id() is Python's definitive answer on whether two objects are the same or different, your premise is incorrect - this isn't an optimization, it's a fundamental part of Python's view on the world. 如果id()是Python关于两个对象相同还是不同的权威性答案,那么您的前提是不正确的-这不是一种优化,这是Python对世界的看法的基本组成部分。

You shouldn't be relying on these objects to be different objects at all. 您根本不应该依赖这些对象来成为不同的对象。 There's no way to turn this behavior off without modifying and recompiling Python, and which particular objects it applies to is subject to change without notice. 如果没有修改和重新编译Python,就无法关闭此行为,并且适用于哪个特定对象的更改恕不另行通知。

This version accounts for wim's concerns about more aggressive internment in the future. 这个版本说明了wim对未来更积极的实习的担忧。 It will use more memory, which is why I discarded it originally, but probably is more future proof. 它会使用更多的内存,这就是为什么我本来会丢弃它的原因,但可能是将来的证明。

>>> class Wrapper(object):
...     def __init__(self, obj):
...             self.obj = obj

>>> a = 1
>>> b = 1
>>> aWrapped = Wrapper(a)
>>> bWrapped = Wrapper(b)
>>> aWrapped is bWrapped
False
>>> aUnWrapped = aWrapped.obj
>>> bUnwrapped = bWrapped.obj
>>> aUnWrapped is bUnwrapped
True

Or a version that works like the pickle answer (wrap + pickle = wrapple): 或像泡菜答案一样工作的版本(包装+泡菜=包装):

class Wrapple(object):
    def __init__(self, obj):
        self.obj = obj

    @staticmethod
    def dumps(obj):
        return Wrapple(obj)

    def loads(self):
        return self.obj

aWrapped = Wrapple.dumps(a)
aUnWrapped = Wrapple.loads(a)

Well, seeing as no one posted a response that was useful, I'll just let you know what I ended up doing. 好吧,既然没有人发布有用的回复,我只会告诉您我最终要做什么。

First, some friendly advice to someone who might read this one day. 首先,给可能读过这一天的人一些友善的建议。 This is not recommended for normal use, so if you're contemplating it, ask yourself if you have a really good reason. 不建议在正常使用时使用它,因此,如果您打算使用它,请问问自己是否有充分的理由。 There are good reason, but they are rare, and if someone says there aren't, they just aren't thinking hard enough. 有充分的理由,但很少见,如果有人说没有,那就是他们没有足够努力地思考。

In the end, I just used pickle.dumps() on all the objects and passed the output in instead of the real object. 最后,我仅在所有对象上使用pickle.dumps()并将输出而不是实际对象传入。 On the other side I checked the id and then used pickle.loads() to restore the object. 另一方面,我检查了id,然后使用pickle.loads()恢复该对象。 The nice part of this solution was it works for all types including None and Booleans. 该解决方案的优点是它适用于所有类型,包括无和布尔值。

>>> a = 1
>>> b = 1
>>> a is b
True
>>> aPickled = pickle.dumps(a)
>>> bPickled = pickle.dumps(b)
>>> aPickled is bPickled
False
>>> aUnPickled = pickle.loads(aPickled)
>>> bUnPickled = pickle.loads(bPickled)
>>> aUnPickled is bUnPickled
True
>>> aUnPickled
1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM