简体   繁体   中英

Eclipse/Ant md5 do not match any other md5

so i'm still struggling to follow this : http://code.google.com/p/dkpro-core-asl/wiki/MyFirstDKProProject

i'm stick at another place with very strange MD5 trouble, and i do not understand why my Eclipse/Ant calculate different md5 than the md5 i can calculate by using md5sum (cygwin) or with Python for example !

Eclipse/Ant msg :

BUILD FAILED

D:\eclipseWorkspace\maven.1334761781732\branches\1.2.x\de.tudarmstadt.ukp.dkpro.core.treetagger\src\scripts\build.xml:34: The following error occurred while executing this line:
D:\eclipseWorkspace\maven.1334761781732\branches\1.2.x\de.tudarmstadt.ukp.dkpro.core.treetagger\src\scripts\build.xml:311: The following error occurred while executing this line:
D:\eclipseWorkspace\maven.1334761781732\branches\1.2.x\de.tudarmstadt.ukp.dkpro.core.treetagger\src\scripts\build.xml:451: MD5 checksum mismatch for [la-tagger-little-endian.par]. 
Please verify the checksum and if necessary update this script. 
Expected: f959f8633ef842f069f0331ad19dc8b4
Actual  : bde1f6a63b2c5a658ba25a8eb90832a8

ok, this is something possible as the file may have changed on the FTP, here is the part of the build.xml files of ANT :

<target name="la">
    <property name="version.la" value="2011050700"/>

    <install-model-file url="ftp://ftp.ims.uni-stuttgart.de/pub/corpora/latin-par-linux-3.2.bin.gz"
        type="tagger" endianness="little-endian" language="la" encoding="ISO-8859-1"
        md5="f959f8633ef842f069f0331ad19dc8b4"/>
</target>

where things just go weird for me is there :

using CYGWIN (afer manually downloading the file via FTP with filezilla, binary mode or auto, of course not ascii) :

$ md5sum latin-par-linux-3.2.bin.gz
e77493eed28857bf93aca91c2a6e5a9b *latin-par-linux-3.2.bin.gz

using python :

import urllib
import hashlib
data = urllib.urlopen("ftp://ftp.ims.uni-stuttgart.de/pub/corpora/latin-par-linux-3.2.bin.gz").read()
md5 = hashlib.md5()
md5.update(data)
print md5.hexdigest()
e77493eed28857bf93aca91c2a6e5a9b

or

def md5_for_file(filePath):
    md5 = hashlib.md5()
    file = open(filePath, 'rb')
    while True:
        data = file.read(8192)
        if not data:
            break
        md5.update(data)

    file.close()   
    return md5.hexdigest()

print md5_for_file(r"D:\ftp.ims.uni-stuttgart.de.pub.corpora.20120419\latin-par-linux-3.2.bin.gz")
e77493eed28857bf93aca91c2a6e5a9b

and also using a freeware from web to calculate MD5, they all match each other BUT are different than the one ANT calculate as "actual" !

In order to calculate md5 manually, You suppose to extract the file first.

Use gunzip or 7zip.

I'm a DKPro Core developer. The reason why we do these MD5 checks is what we want to notice when a remote file changes without further notice.

You don't have to calculate the MD5 sum yourself. The script tells you which MD5 it knows and what it actually got. If you want the script to continue running, just update the MD5 recorded in the build.xml with the it told you was the "actual" one. You should, however, also update the version.

The following passage is from our wiki and explains the rationale behind this:

Not all of the resources are properly versioned by their maintainers. We observe that resources change from one day to the next without any announcement or increase of the version number (if present at all). Thus, we validate all resources against an MD5 checksums stored in the build.xml file. This way, we can notice if a remote resource has been changed. When this happens, we add a note to the build.xml file indicating when we noticed the MD5 change update the version of the corresponding resource.

Since we do not test the build.xml files every day, you may get an MD5 checksum error when you try to package the resources yourself. If this happens, open the build.xmlfile with a text editor, locate the MD5 checksum that fails, update it and update the version of the corresponding resource. You can also tell us on the DKPro Core User Group and we will update the build.xml file.

Btw. the tutorial has changed meanwhile to use different components for which we can distribute the models, so this should rarely be an issue anymore.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM