简体   繁体   English

filecmp.cmp()什么时候返回假阳性或假阴性?

[英]When will filecmp.cmp() return a false positive or false negative?

Using Windows 7, I have two folders, a “Master” folder where I work on the files, and a “Backup” folder on a NAS4Free server. 使用Windows 7,我有两个文件夹,一个是我在其中处理文件的“主”文件夹,另一个是NAS4Free服务器上的“备份”文件夹。

I have over 800 jpg files, totaling 2.6GB, ranging in sizes from 124KB to 16MB. 我有800多个jpg文件,总计2.6GB,大小从124KB到16MB不等。

I frequently “swap” file names, ie: 我经常“交换”文件名,即:

rename 01-020.jpg 99-020.jpg
rename 01-040.jpg 01-020.jpg
rename 99-020.jpg 01-040.jpg

I also add new files - 01-030.jpg - then renumber the set, ie: 我还添加了新文件-01-030.jpg-然后重新编号组,即:

rename 01-020.jpg 99-020.jpg
rename 01-030.jpg 99-040.jpg
rename 01-040.jpg 99-060.jpg
rename 99-020.jpg 01-020.jpg
rename 99-040.jpg 01-040.jpg
rename 99-060.jpg 01-060.jpg

To keep the Master and Backup folders in sync, I first looked at doing an XCOPY or ROBOCOPY of the entire folder, but that is too time consuming, especially since the vast majority of the files haven't changed. 为了使主文件夹和备份文件夹保持同步,我首先考虑对整个文件夹进行XCOPY或ROBOCOPY,但这非常耗时,特别是因为绝大多数文件都没有更改。

I'm trying to come up with a Python 3 solution. 我正在尝试提出一个Python 3解决方案。 I've read the documentation on filecmp.cmp(). 我已经阅读了关于filecmp.cmp()的文档。 What worries me is the statement: 令我担心的是:

“…returns True if they seem equal…” (emphasis mine). “……如果看起来相等,则返回True……”(强调我的意思)。

Specifying shallow=False seems to be overkill, causing filecmp to compare the contents of 1,600+ files, when the vast majority of the comparisons will match. 指定shallow = False似乎是过大的,导致当绝大多数比较都匹配时,filecmp会比较1600多个文件的内容。

Specifying shallow=True causes filecmp to use the os.stat() function. 指定shallow = True将导致filecmp使用os.stat()函数。 Running tests with that function, on two files that filecmp returns True, some of the values returned by stat are identical, others are different. 使用该函数运行测试,在filecmp返回True的两个文件上,stat返回的某些值相同,其他则不同。 Apparently, filecmp doesn't use ALL the values returned by stat to determine if the files are equal. 显然,filecmp不会使用stat返回的所有值来确定文件是否相等。

So, my question: Under what “real-world” situations will filecmp.cmp(file1, file2, shallow=True) return a false positive or a false negative? 因此,我的问题是:在什么“实际”情况下,filecmp.cmp(file1,file2,shallow = True)将返回假阳性或假阴性? Can I trust it? 我可以相信吗?

And, a possible “sub-question”, which specific values returned by os.stat() does filecmp.cmp() use? 并且,可能的“子问题”,filecmp.cmp()使用os.stat()返回的特定值?

(If you're curious what I'm doing with the files, I discuss it here: https://hikearizona.com/dex2/viewtopic.php?f=78&t=9538 ) (如果您好奇我在处理文件,请在这里进行讨论: https : //hikearizona.com/dex2/viewtopic.php?f=78&t=9538

The comparison will return true only when size and modified time attribute values are same. 仅当大小和修改的时间属性值相同时,该比较才会返回true。 it can return false positives only if same exact number of bytes were modified at the same time. 只有同时修改了相同的确切字节数,它才能返回假阳性。

module file references that can be used to confirm above stated : 可以用来确认上述内容的模块文件参考:

Excerpt from cmp function implementation (filecmp.py) 摘录自cmp函数实现(filecmp.py)

        s1 = _sig(os.stat(f1))
        s2 = _sig(os.stat(f2))
        if shallow and s1 == s2:
            return True

_sig funcion which is utilized above (filecmp.py): 上面使用的_sig函数(filecmp.py):

def _sig(st):
    return (stat.S_IFMT(st.st_mode),
            st.st_size,
            st.st_mtime)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 python-比较新写入的文件与filecmp.cmp()总是返回False? - python - comparing a newly written file with filecmp.cmp() always returns False? 即使gzip压缩的文件相同,Python 2.7 filecmp.cmp也返回false - Python 2.7 filecmp.cmp returns false even though the gzipped files are identical 为什么filecmp.cmp不一致? - Why is filecmp.cmp being inconsistent? 为什么filecmp.cmp对于大文件来说速度慢,即使它的'shallow'参数为True? - Why is filecmp.cmp slow for huge files even when its 'shallow' parameter is True? 如何在 Django 中为 InMemoryUploadedFile 对象使用 filecmp.cmp()? - How to use filecmp.cmp() in Django for InMemoryUploadedFile objects? 非浅层 filecmp.cmp 究竟有什么作用? - What exactly does a non-shallow filecmp.cmp do? filecmp.cmp()忽略不同的os.stat()签名? - filecmp.cmp() ignoring differing os.stat() signatures? Python档案cmp.cmp('old_index.html','new_index.html')在发生事件时 - Python filecmp.cmp('old_index.html', 'new_index.html') in if event 使用 filecmp.cmp(file1, file2) 将文件与文件列表进行比较 - Comparing file with a list of files using filecmp.cmp(file1, file2) True positive, False positive, False Negative 计算数据帧 python - True positive, False positive, False Negative calculation data frame python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM