简体   繁体   English

如何获取文件夹中的所有文件并在python中获取MD5哈希?

[英]How to grab all files in a folder and get their MD5 hash in python?

I'm trying to write some code to get the md5 of every exe file in a folder. 我正在尝试编写一些代码来获取文件夹中每个exe文件的md5。

My problem is that I don't understand how to do it. 我的问题是我不明白该怎么做。 It works only if the folder contains only one file. 仅当文件夹仅包含一个文件时,它才有效。 This is my code: 这是我的代码:

import glob
import hashlib
file = glob.glob("/root/PycharmProjects/untitled1/*.exe")

newf = str (file)
newf2 =  newf.strip( '[]' )
newf3 = newf2.strip("''")

with open(newf3,'rb') as getmd5:
    data = getmd5.read()
    gethash= hashlib.md5(data).hexdigest()
    print gethash

And I get the result: 我得到了结果:

a7f4518aae539254061e45424981e97c

I want to know how I can do it to more than one file in the folder. 我想知道如何对文件夹中的多个文件执行此操作。

glob.glob returns a list of files. glob.glob返回文件列表。 Just iterate over the list using for : 只需使用for迭代列表:

import glob
import hashlib

filenames = glob.glob("/root/PycharmProjects/untitled1/*.exe")

for filename in filenames:
    with open(filename, 'rb') as inputfile:
        data = inputfile.read()
        print(filename, hashlib.md5(data).hexdigest())

Notice that this can potentially exhaust your memory if you happen to have a large file in that directory, so it is better to read the file in smaller chunks (adapted here for 1 MiB blocks): 请注意,如果您在该目录中碰巧有一个大文件,这可能会耗尽您的内存,因此最好以较小的块读取文件 (此处适用于1 MiB块):

def md5(fname):
    hash_md5 = hashlib.md5()
    with open(fname, "rb") as f:
        for chunk in iter(lambda: f.read(2 ** 20), b""):
            hash_md5.update(chunk)
    return hash_md5.hexdigest()

for filename in filenames:
    print(filename, md5(filename))

I think in the end, you're opening only one empty file. 我想最后,你只打开一个空文件。 The reason for that is that you take the list returned by glob and remove the list markers in its string representation (and only at both ends of the string as you use strip . This gives you something like: 这样做的原因是你获取glob返回的列表并删除其字符串表示中的列表标记(并且只在字符串的两端使用strip 。这给你类似的东西:

file1.exe' 'file2.exe' 'file3.exe

You then give this string to open that will try to open a file called like that. 然后你打开这个字符串,试图打开一个这样的文件。 In fact, I'm even surprised it works (unless you have only one file) ! 事实上,我甚至感到惊讶它的工作原理(除非你只有一个文件)! You should get a FileNotFoundError . 你应该得到一个FileNotFoundError

What you want to do is iterate on all the files returned by glob.glob : 你想要做的是迭代glob.glob返回的所有文件:

import glob
import hashlib
file = glob.glob("/root/PycharmProjects/untitled1/*.exe")

for f in file:
    with open(f, 'rb') as getmd5:
        data = getmd5.read()
        gethash = hashlib.md5(data).hexdigest()
        print("f: " + gethash)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM