简体   繁体   English

Python MD5 散列相同的内容返回不同的散列

[英]Python MD5 hashing same content returns different hash

I am writing a python program, because I am lazy, that checks a website for a job opening I have been told about and returns all the jobs the companies web page.我正在编写一个 python 程序,因为我很懒惰,它会检查一个网站是否有我被告知的职位空缺,并返回公司网页上的所有职位。

Here is my code so far (yes I know the code is jancky however I am just trying to get it working)到目前为止,这是我的代码(是的,我知道代码很笨拙,但我只是想让它工作)

import requests
from bs4 import BeautifulSoup
import sys
import os
import hashlib

reload(sys)
sys.setdefaultencoding('utf8')

res = requests.get('WEBSITE URL', verify=False)
res.raise_for_status()

filename = "JobWebsite.txt"

def StartUp():
    if not os.path.isfile(filename):
        try:
            jobfile = open(filename, 'a')
            jobfile = open(filename, 'r+')
            print("[*] Succesfully Created output file")
            return jobfile
        except:
            print("[*] Error creating output file!")
            sys.exit(0)
    else:
         try:
             jobfile = open(filename, 'r+')
             print("[*] Succesfully Opened output file")
             return jobfile
         except:
             print("[*] Error opening output file!")
             sys.exit(0)

 def AnyChange(htmlFile):
    fileCont = htmlFile.read()
    FileHash = hasher(fileCont, "File Code Hashed")
    WebHash = hasher(res.text, "Webpage Code Hashed")
    !!!!! Here is the Problem
    print ("[*] File hash is " + str(FileHash))
    print ("[*] Website hash is " + str(WebHash))
    if FileHash == WebHash:
        print ("[*] Jobs being read from file!")
        num_of_jobs(fileCont)
    else:
        print("[*] Jobs being read from website!")
        num_of_jobs(res.text)
        deleteContent(htmlFile)
        writeWebContent(htmlFile, res.text)

def hasher(content, message):
    content = hashlib.md5(content.encode('utf-8'))
    return content

def num_of_jobs(htmlFile):
    content = BeautifulSoup(htmlFile, "html.parser")
    elems = content.select('.search-result-inner')
    print("[*] There are " + str(len(elems)) + " jobs available!")

def deleteContent(htmlFile):
    print("[*] Deleting Contents of local file! ")
    htmlFile.seek(0)
    htmlFile.truncate()

def writeWebContent(htmlFile, content):
    htmlFile = open(filename, 'r+')
    print("[*] Writing Contents of website to file! ")
    htmlFile.write(content.encode('utf-8'))

jobfile = StartUp()
AnyChange(jobfile)

The problem I currently have is that I hash both of the websites html code and the files html code.我目前遇到的问题是我对网站 html 代码和文件 html 代码进行了哈希处理。 However both of the hashes don't match, like ever, I am not sure and can only guess that it might be something with the contents being save in a file.然而,两个哈希值都不匹配,就像以往一样,我不确定,只能猜测它可能是内容保存在文件中的东西。 The hashes aren't too far apart but it still causes the If statement to fail each time散列不是太远,但它仍然导致 If 语句每次都失败

Breakpoint in Program with hashes程序中带有散列的断点

The screenshot you have attached is showing the location of the two hash objects fileHash and webHash .您附加的屏幕截图显示了两个哈希对象fileHashwebHash They should be in different locations.他们应该在不同的位置。

What you really want to compare is the hexdigest() of the two hash objects.您真正想要比较的是两个哈希对象的hexdigest() Change your if statement to:将您的if语句更改为:

if FileHash.hexdigest() == WebHash.hexdigest():
        print ("[*] Jobs being read from file!")
        num_of_jobs(fileCont)

Take a look at this other StackOverflow answer for some more how-to.查看其他 StackOverflow 答案以了解更多操作方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM