简体   繁体   中英

Django: Validate uniqueness of user uploaded files with hashes

I am trying to find an elegant solution to check whether an user uploaded file already exists.

  1. A user uploads a file using a SubmissionForm , which just includes the file field of the Submission class.
  2. This form should compute the hash for the file, add it to the Submission instance, and compute validation checks there.
  3. If the uniqueness constraint on the hash value is violated, this should raise a validation error using the duplicate code, so that the form can print an adequate error message.

The relevant files:

# models.py
class Submission(models.Model):
    exercise = models.ForeignKey(Exercise, on_delete=models.PROTECT)
    file_sha1 = models.CharField(max_length=40, editable=False, unique=True)
    file = models.FileField(
        upload_to=<... file saving ...>,
        validators=[
            <... validators ...>
        ],
    )
# forms.py
class SubmissionForm(forms.models.ModelForm):
    class Meta:
        model = Submission
        fields = ("file",)
        widget = {"file": forms.FileField()}
        error_messages = {
            "file": {
                <...error message codes + messages...>,
                "duplicate": DUPLICATE_ERROR,
            }
        }

    def save(self, exercise):
        self.instance.exercise = exercise
        return super().save()

In addition, I have a _generate_sha1(file) method hashing a file. I saw couple of solutions to this problem using validation in views.py but I believe that is not the right place for it. I looked into clean() and clean_file() but I did not manage to get it working.

Thanks for your help.

A bit late to the party, but I was looking for something similar myself.

I've implemented the checking logic in my form's clean() method. Note I'm manually opening (and closing, if needed) the file as the model's save() method still needs to access it.

Models:

class Image(models.Model):
    image = models.ImageField(upload_to="images/",)
    image_sha1 = models.CharField(max_length=40, editable=False, unique=True)

    def save(self, *args, **kwargs):
        # Calculate the SHA-1 hash of the uploaded file and store to DB.
        sha1_hash = hashlib.sha1()
        with self.image.open("rb") as f:
            # Read and update hash string value in blocks of 4K
            for byte_block in iter(lambda: f.read(4096), b""):
                sha1_hash.update(byte_block)
            self.image_sha1 = sha1_hash.hexdigest()

            super(Image, self).save(*args, **kwargs)

Forms:

class CreateImageForm(forms.ModelForm):

    class Meta:
        model = Image
        fields = ('image', )

    def clean(self):
        cleaned_data = super(CreateImageForm, self).clean()

        uploaded_image = cleaned_data.get('image', None)

        sha1_hash = hashlib.sha1()
        # Not using 'with' to open/close the file, as we need to access it in the model's save() function.
        # Instead, manually opening and closing the file (in case of error).
        f = uploaded_image.open("rb")
        # Read and update hash string value in blocks of 4K
        for byte_block in iter(lambda: f.read(4096), b""):
            sha1_hash.update(byte_block)

        # Check if we already have a patient with the same name/dob
        existing_image = Image.objects.filter(image_sha1=sha1_hash.hexdigest())
        if existing_image:
            f.close()
            raise ValidationError(
                mark_safe("This image had been uploaded before: '<a href='{0}'>{1}</a>'."
                          .format(reverse('image_view', args=(existing_image[0].id,)), existing_image[0]))
                          )

        return cleaned_data

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM