简体   繁体   English

上传文件并通过知道更新文件来防止重复

[英]Uploading files and preventing duplicates by knowing to update the file

In our system, when a user uploads a file it is stored in a unique file system structure and a database record is generated. 在我们的系统中,当用户上传文件时,文件将存储在唯一的文件系统结构中,并生成数据库记录。 A file is uploaded via the webbrowser via XMLHttpRequest. 通过XMLHttpRequest通过Web浏览器上传文件。 The file then gets moved from the temporary upload area into the FS. 然后将文件从临时上传区域移到FS。

How can I detect that a file after being uploaded already exists in my FS? 如何检测上载后的文件在我的FS中已经存在?

If the file uploaded is the same as one already uploaded.
If the file is the same file, but the uploaded content has been updated which 
  means I need to update the file in the FS.

I am ignoring file names as a way of knowing if the file already exists. 我将忽略文件名,以了解文件是否已存在。 A filename cannot be considered unique. 文件名不能被认为是唯一的。 An example is that some cameras name photos using an incremental number that rolls over after a time. 一个例子是,某些相机使用递增的数字命名照片,该数字在一段时间后会翻转。 When a file is uploaded via the web browser, the source file structure is masked. 通过网络浏览器上传文件时,源文件结构被屏蔽。 Eg C:\\Users\\Drive\\File\\Uploaded\\From . 例如C:\\Users\\Drive\\File\\Uploaded\\From So I cant use the that to figure out if the file has already been uploaded. 所以我不能用那个来确定文件是否已经上传。

How do I know the file being uploaded already exists because its content is the same. 我如何知道要上传的文件已经存在,因为其内容相同。 Or it exists but because the uploaded file has been changed, so I can just update the file? 还是存在,但是因为上载的文件已更改,所以我可以更新文件吗?

Microsoft Word documents create a challenge as Word regenerates the file on every save. Word在每次保存时都会重新生成文件,因此Microsoft Word文档带来了挑战。

In a situation where the user renames a file on their own accord, I could say tough luck. 在用户自行重命名文件的情况下,我可以说很不幸。

I would start with finding files that are the same via an SHA Hash. 我将从通过SHA哈希查找相同的文件开始。 You could use something like this to get a list of files that have the same hash as your newly uploaded file then take some action. 您可以使用类似的方法来获取与新上传的文件具有相同哈希值的文件列表,然后采取一些措施。

Just an example of getting the hash of the new file: 只是获取新文件的哈希的示例:

string newfile;
    using(FileStream fs = new FileStream(   string newfile;
    using(FileStream fs = new FileStream("C:\\Users\\Drive\\File\\Uploaded\\From\\newfile.txt", FileMode.Open))
    {
        using (System.Security.Cryptography.SHA1Managed sha1 = new System.Security.Cryptography.SHA1Managed())
        {
            newfile = BitConverter.ToString(sha1.ComputeHash(fs));
        }
    }   

This goes through all files and gets a list of file names and hashes 这遍历所有文件并获得文件名和哈希的列表

var allfiles = Directory.GetFiles(@"var allfiles = Directory.GetFiles(@"C:\Users\Drive\File\Uploaded\From\", "*.*")
        .Select(
            f => new
                     {
                         FileName = f,
                         FileHash = new System.Security.Cryptography.SHA1Managed()
                                                            .ComputeHash(new FileStream(f, 
                                                                             FileMode.Open, 
                                                                             FileAccess.Read))
                     })       
        .ToList();

        foreach(var fi in allfiles){
        if(newfile == BitConverter.ToString(fi.FileHash))
            Console.WriteLine("Match!!!");
        Console.WriteLine(fi.FileName + ' ' + BitConverter.ToString(fi.FileHash));
        }

}", " . }“,” ") .Select( f => new { FileName = f, FileHash = new System.Security.Cryptography.SHA1Managed() .ComputeHash(new FileStream(f, FileMode.Open, FileAccess.Read)) }) “).Select(f => new {FileName = f,FileHash = new System.Security.Cryptography.SHA1Managed().ComputeHash(new FileStream(f,FileMode.Open,FileAccess.Read))})
.ToList(); .ToList();

This loops through them all and looks for a match to the new one. 这将遍历所有对象,并寻找与新对象的匹配。

        foreach(var fi in allfiles){
        if(newfile == BitConverter.ToString(fi.FileHash))
            Console.WriteLine("Match!!!");
        Console.WriteLine(fi.FileName + ' ' + BitConverter.ToString(fi.FileHash));
        }

Ideally you would save this hash when the file is uploaded since this is very intense to recompute. 理想情况下,您将在文件上传时保存此哈希,因为重新计算非常耗时。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM