简体   繁体   中英

Generate unique id using strings

I am parsing data from multiple sources and I want to assign a unique (string) id to each entry. Each entry contains a title (string), url(string) and body(string). We can get same title from multiple sources but those will have different urls and I would like to store both the items in that case. I am thinking of creating a hash of title and url and assign that as an id, that ways if I get same title and url from different sources, the id will be same and I will be able to identify that it's a duplicate.

import hashlib 
hashlib.sha256(str("title url").encode('utf-8')).hexdigest()

But I think there can be a case where 2 different title url combinations might generate same hash, not sure how to overcome the clash. Can someone suggest a way of generating unique identifier using strings I don't want to use timestamp because I might get same row from different sources at different times

You're safe , you won't have 2 different title url combinations generating same hash with SHA-256


SHA256 is a cryptographic hash function, from the SHA-2 hash family, and is a standard from 2020.

The collision probability (2 inputs gives same output) is 1/(2^128) which is about 2e-39 .


See: SHA-256 collisions on crypto.stackexchange

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM