简体   繁体   中英

Create unique identifier for files using Python

I am looking for a robust solution to define a unique identifier for measurement data files. I collect the data from different sources, mainly from network storage. The data files might be renamed and copied more than once to different locations. The method only needs to run on Windows platform. So far I do the following: create an ID from the last modification time and the size of the file. I assume that the file will only once be created during the measurement process and never be modified afterwards. This is my current implementation:

import pathlib
import datetime

def file_uid(file):

    fname = pathlib.Path(file)
    mod_time = datetime.datetime.fromtimestamp(fname.stat().st_mtime).strftime("%d.%m.%Y %H:%M:%S")
    file_size = fname.stat().st_size
    uid = '%s%s%s' %(mod_time,'_',str(file_size))
    return uid

Can this idea work, or did I miss something in general? What will be the best practice to accomplish a robust solution for this issue? Or should I go with some checksum algorithm and what would be recommended?

I would recommend assigning each file a short UDID. you can use something such as shortuuid:

pip install shortuuid

and then just

shortuuid.ShortUUID().random(length=22)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM