简体   繁体   English

如何使用对该表唯一的不同整数替换 Django 的主键

[英]How to replace Django's primary key with a different integer that is unique for that table

I have a Django web application that uses the default auto-incremented positive integers as the primary key.我有一个 Django Web 应用程序,它使用默认的自动递增正整数作为主键。 This key is used throughout the application and is frequently inserted into the URL.此密钥在整个应用程序中使用,并经常插入到 URL 中。 I don't want to expose this number to the public so that they can guess the number of users or other entities in my Database.我不想向公众公开这个数字,以便他们可以猜测我的数据库中的用户或其他实体的数量。

This is a frequent requirement and I have seen questions to similar mine with answers.这是一个常见的要求,我已经看到了类似我的问题的答案。 Most solutions recommend hashing the original primary key value.大多数解决方案建议散列原始主键值。 However, none of those answers fit my need exactly.但是,这些答案中没有一个完全符合我的需要。 These are my requirements:这些是我的要求:

  1. I would like to keep the Primary Key field type as Integer.我想将主键字段类型保留为整数。
  2. I also would prefer not to have to hash/unhash this value every time it is read or written or compared to the database.我也不希望每次读取或写入或与数据库进行比较时都不必散列/取消散列此值。 That seems wastefuly It would be nice to do it just once: When the record is initially inserted into the Database这似乎很浪费 只做一次就好了:当记录最初插入数据库时
  3. The hashing/encryption function need not be reversible since I don't need to recover the original sequential key.散列/加密函数不需要是可逆的,因为我不需要恢复原始顺序密钥。 The hashed value just needs to be unique.散列值只需要是唯一的。
  4. The hashed value needs to be unique ONLY for that table -- not universally unique.散列值仅对于该表需要是唯一的——不是普遍唯一的。
  5. The hashed value should be as short as possible.散列值应尽可能短。 I would like to avoid extremely long (20+ characters) URLs我想避免使用超长(20 多个字符)的 URL

What is the best way to do achieve this?实现这一目标的最佳方法是什么? Would the following work?以下会起作用吗?

def hash_function(int):
    return fancy-hash-function # What function should I use??


def obfuscate_pk(sender, instance, created, **kwargs):
    if created:
        logger.info("MyClass #%s, created with created=%s: %s" % (instance.pk, created, instance))
        instance.pk = hash_function(instance.pk)
        instance.save()
        logger.info("\tNew Pk=%s" % instance.pk)

class MyClass(models.Model):
    blahblah = models.CharField(max_length=50, null=False, blank=False,)


post_save.connect(obfuscate_pk, sender=MyClass)

The Idea想法

I would recommend to you the same approach that is used by Instagram .我会向您推荐Instagram使用的相同方法。 Their requirements seems to closely follow yours.他们的要求似乎与您的要求密切相关。

Generated IDs should be sortable by time (so a list of photo IDs, for example, could be sorted without fetching more information about the photos) IDs should ideally be 64 bits (for smaller indexes, and better storage in systems like Redis) The system should introduce as few new 'moving parts' as possible—a large part of how we've been able to scale Instagram with very few engineers is by choosing simple, easy-to-understand solutions that we trust.生成的 ID 应该可以按时间排序(例如,可以对照片 ID 列表进行排序,而无需获取有关照片的更多信息)ID 理想情况下应该是 64 位(对于较小的索引,以及在 Redis 等系统中更好的存储)应该尽可能少地引入新的“活动部件”——我们之所以能够以很少的工程师扩展 Instagram 的很大一部分是通过选择我们信任的简单、易于理解的解决方案。

They came up with a system that has 41 bits based on the timestamp, 13 o the database shard and 10 for an auto increment portion.他们提出了一个基于时间戳的 41 位系统,13 位数据库分片和 10 位自动增量部分。 Sincce you don't appear to be using shards.因为您似乎没有使用碎片。 You can just have 41 bits for a time based copmonent and 23 bits chosen at random.您可以只使用 41 位作为基于时间的共模项,并随机选择 23 位。 That does produce an extremely unlikely 1 in 8.3 million chance of getting a conflict if you insert records at the same time.如果您同时插入记录,那么发生冲突的几率是 830 万分之一。 But in practice you are never likely to hit this.但在实践中,你永远不可能碰到这个。 Right so how about some code:对,那么一些代码如何:

Generating IDs生成 ID

START_TIME = a constant that represents a unix timestamp

def make_id():
    '''
    inspired by http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram
        '''
    
    t = int(time.time()*1000) - START_TIME
    u = random.SystemRandom().getrandbits(23)
    id = (t << 23 ) | u
    
    return id


def reverse_id(id):
    t  = id >> 23
    return t + START_TIME 

Note, START_TIME in the above code is some arbitary starting time.注意,上面代码中的START_TIME是一些任意的开始时间。 You can use time.time()*1000 , get the value and set that as START_TIME您可以使用 time.time()*1000 ,获取值并将其设置为START_TIME

Notice that the reverse_id method I have posted allows you to find out at which time the record was created.请注意,我发布的reverse_id方法允许您找出记录的创建时间。 If you need to keep track of that information you can do so without having to add another field for it!如果您需要跟踪该信息,您可以这样做而无需为其添加另一个字段! So your primary key is actually saving your storage rather than increasing it!所以你的主键实际上是在节省你的存储空间而不是增加它!

The Model模型

Now this is what your model would look like.现在这就是你的模型的样子。

class MyClass(models.Model):
   id = models.BigIntegerField(default = fields.make_id, primary_key=True)  

If you make changes to your database outside django you would need to create the equivalent of make_id as an sql function如果您在 django 之外对数据库进行更改,则需要将make_id的等效项make_id为 sql 函数

As a foot note.作为脚注。 This is somewhat like the approach used by Mongodb to generate it's _ID for each object.这有点像 Mongodb 用来为每个对象生成它的_ID的方法。

You need to separate two concerns:您需要分离两个关注点:

  1. The primary key, currently an auto-incrementing integer, is the best choice for a simple, relatively predictable unique identifier that can be enforced on the database level.主键,目前是一个自动递增的整数,是可以在数据库级别强制执行的简单、相对可预测的唯一标识符的最佳选择。

  2. That does not mean you have to expose it to users in your URLs.这并不意味着您必须在您的 URL 中向用户公开它。

I'd recommend adding a new UUID field to your model, and remapping your views to use it, instead of the PK, for object lookups.我建议向您的模型添加一个新的 UUID 字段,并重新映射您的视图以使用它而不是 PK 来进行对象查找。

A really simple solution is simply encrypting the ID before sending it out to an external source.一个非常简单的解决方案是在将 ID 发送到外部源之前对其进行加密。 You can decrypt it on the way back in.你可以在回来的路上解密它。

Keep the AUTO_INCREMENT , but pass it around in a semi-secret way: In a cookie.保留AUTO_INCREMENT ,但以半秘密的方式传递它:在 cookie 中。 It takes a bit of coding to establish the cookie, set it, and read it.建立 cookie、设置和读取 cookie 需要一些编码。 But cookies are hidden from all but serious hackers.但是 cookie 是隐藏的,除了严肃的黑客。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM