简体   繁体   English

如何从 10 个长度的数字/字母医疗 ID 创建一个不可逆的唯一 ID?

[英]How to create a irreversible unique ID from a 10 length number/letter medical ID?

I would like to create a unique ID from a medical ID.我想从医疗 ID 创建一个唯一 ID。 It sounds like a common problem, but I haven't been able to find the topic on stackoverflow or via Google.这听起来像是一个常见问题,但我无法在 stackoverflow 或 Google 上找到该主题。 I'm new to python, so a code example would be great!我是 python 的新手,所以一个代码示例会很棒!

I've got several dataframes with upto 4 million rows where 5-6000 different patients exist, and I would like to be able to add more patients (a max of 5 million unique patients) with the same code and chance of uniqueness.我有几个具有多达 400 万行的数据框,其中存在 5-6000 名不同的患者,我希望能够添加更多具有相同代码和唯一性机会的患者(最多 500 万唯一患者)。 In total I got up to 10 million rows in the finally merged dataset.在最终合并的数据集中,我总共获得了多达 1000 万行。

It should be near impossible to reverse engineer the generated unique ID, eventhough you know the format of the medical ID即使您知道医疗 ID 的格式,也几乎不可能对生成的唯一 ID 进行逆向工程

The medical ID consist of birthday (YYMMDD), and four variables of only digit(0-9) and/or letters(AZ).医疗 ID 由生日 (YYMMDD) 和四个只有数字 (0-9) 和/或字母 (AZ) 的变量组成。

I've read the following posts on the subject, and some questions remain unanswered:我已阅读有关该主题的以下帖子,但有些问题仍未得到解答:

Irreversible unique ID from String Here one describes the possibility of using rainbowtables to revese engineer the unique ID. 来自字符串的不可逆唯一 ID这里描述了使用彩虹表来修改唯一 ID 的可能性。 And he describes using salt to get around the possibility of using a rainbowtable.他描述了使用盐来绕过使用彩虹表的可能性。 Unfortunatly salt is something I've newer worked with.不幸的是,盐是我最近使用的东西。

https://www.sohamkamani.com/uuid-versions-explained/ If I use UUID v1 it's dependent on the current computers MAC-adress, which is not an option as the same unique ID should be the same independent on the computer its generated on. https://www.sohamkamani.com/uuid-versions-explained/如果我使用 UUID v1,它取决于当前计算机的 MAC 地址,这不是一个选项,因为相同的唯一 ID 应该相同,独立于计算机上生成。 I can't really get my headaround the possibility to reverse engineer the unique ID using UUID v4 and a rainbowtable, as for a person with the right knowledge, it would be quiet easy to figure out the medical ID system.我无法真正解决使用 UUID v4 和彩虹表对唯一 ID 进行逆向工程的可能性,因为对于具有正确知识的人来说,很容易弄清楚医疗 ID 系统。

Generate ID from string in Python Using a hash, wouldn't that be easily reversed engineered? 从 Python 中的字符串生成 ID使用 hash,这不是很容易逆向工程吗?

How to generate 8 digit unique identifier to replace the existing one in python pandas 如何生成 8 位唯一标识符来替换 python pandas 中的现有标识符

So my requirements are:所以我的要求是:

  1. A unique ID generated from a medical ID从医疗 ID 生成的唯一 ID
  2. No possible way to reverse it with a rainbowtable (very important, as it is sensitive information).不可能用彩虹表来扭转它(非常重要,因为它是敏感信息)。
  3. Very little risk of collision in generating the unique ID生成唯一 ID 的碰撞风险非常小
  4. Not dependent on MAC-adress or other unique things in a computer不依赖于计算机中的 MAC 地址或其他独特的东西
  5. The same unique ID would be generated from the same medical ID independet on which computer it is generated on.相同的唯一 ID 将从相同的医疗 ID 生成,而与生成它的计算机无关。
  6. Ideally a length of 10-20 digits unique ID, with no letters.理想情况下,长度为 10-20 位的唯一 ID,没有字母。 But if it needs to be longer with both letters (AZ) and numbers (0-9), so be it:)但是如果它需要更长的字母(AZ)和数字(0-9),那就这样吧:)

Does any solution fit the above mentioned requirements?是否有任何解决方案符合上述要求? Could you be kind to provide a code example, if not any of the above mentioned links already have what I need?如果上面提到的任何链接都没有我需要的东西,您能否提供一个代码示例?

Example: (DDMMYYXXXX) Figurative ID's from persons born in year 2022示例:(DDMMYYXXXX)2022 年出生的人的形象 ID

   Medical ID  Bloodsample Date
0  0101221234  5.2         
1  0101224321  6.2         
2  311222R09B  7.6         
3  0203221234  3.8         
4  311222R09B  5.7         
5  0405229082  9.5        
6  1012225879  7.2         
7  2801226787  5.2         
8  2706221HF9  6.3         
9  3112228768  4.6         

0 and 3, and 2 and 4 are the same patients. 0 和 3 以及 2 和 4 是相同的患者。 4 and 7 are not the same patient. 4和7不是同一个病人。

DDMMYYXXXX DDMMYYXXXX

This information gives out the birth date which may be significant hint to identify a small group of people, not really suitable for anonymization此信息给出的出生日期可能是识别一小部分人的重要提示,并不适合匿名化

A unique ID generated from a medical ID...从医疗 ID 生成的唯一 ID...

What are you looking for may be a hash function, Cryptographic hash functions, such as SHA-256, are collision resistant.您正在寻找的可能是hash function,加密 hash 函数,例如 SHA-256,是抗冲突的。 It means the probability of generating the same hash vaue for different inputs should be negligible (however mathematically never zero).这意味着为不同的输入生成相同的 hash 值的概率应该可以忽略不计(但在数学上永远不会为零)。

No possible way to reverse it with a rainbowtable (very important, as it is sensitive information).不可能用彩虹表来扭转它(非常重要,因为它是敏感信息)。

A cryptographic hash would make impossible to reverse the value.加密 hash 将无法反转该值。

Rainbow tables are effective when having a known set of input values.当具有一组已知的输入值时,彩虹表是有效的。 For the input set MMDDYYXXXX it should be possible to generate all possible values in reasonable time.对于输入集 MMDDYYXXXX,应该可以在合理的时间内生成所有可能的值。 to create a reverse-lookup table创建反向查找表

In that case you may try to use HMAC , it is a hash function with a secret key.在这种情况下,您可以尝试使用HMAC ,它是带有密钥的 hash function 。

Unfortunately the python is not my native language, so you will have to consult your favourite search engine to search for implementation不幸的是 python 不是我的母语,所以你必须咨询你最喜欢的搜索引擎来搜索实现

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM