简体   繁体   中英

Downloading HTML instead of File

I'm using Java code to download a file from the Internet and save it to some directory.

However, the code downloads the HTML source code of the page instead of the file contents.

The code below illustrates the problem:

import java.awt.*;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.URL;
import java.nio.channels.Channels;
import java.nio.channels.ReadableByteChannel;

public class JavaFileDownloadTest
{
    public static void download(String remoteURL, String targetFilePath)
            throws IOException
    {
        URL downloadableFile = new URL(remoteURL);
        ReadableByteChannel readableByteChannel = Channels.newChannel(downloadableFile.openStream());
        FileOutputStream fileOutputStream = new FileOutputStream(targetFilePath);
        fileOutputStream.getChannel().transferFrom(readableByteChannel, 0, Long.MAX_VALUE);
    }

    public static void main(String[] arguments) throws IOException
    {
        String userHome = System.getProperty("user.home");
        String fileName = "Test.txt";
        String targetFilePath = userHome + File.separator + "Downloads" + File.separator + fileName;
        download("http://bullywiiplaza.cuccfree.com/" + fileName, targetFilePath);
        Desktop.getDesktop().open(new File(targetFilePath));
    }
}

The file located here contains the text

Hello StackOverflow!

However, when downloaded using the above code, I'm getting the HTML source code as file content instead:

<html><body><script type="text/javascript" src="/aes.js" ></script><script>function toNumbers(d){var e=[];d.replace(/(..)/g,function(d){e.push(parseInt(d,16))});return e}function toHex(){for(var d=[],d=1==arguments.length&&arguments[0].constructor==Array?arguments[0]:arguments,e="",f=0;f<d.length;f++)e+=(16>d[f]?"0":"")+d[f].toString(16);return e.toLowerCase()}var a=toNumbers("f655ba9d09a112d4968c63579db590b4"),b=toNumbers("98344c2eee86c3994890592585b49f80"),c=toNumbers("ae71113e4baf38cee1c1aacf0ae66c00");document.cookie="__test="+toHex(slowAES.decrypt(c,2,a,b))+"; expires=Thu, 31-Dec-37 23:55:55 GMT; path=/"; document.cookie="referrer="+escape(document.referrer); location.href="http://bullywiiplaza.cuccfree.com/Test.txt?ckattempt=1";</script><noscript>This site requires Javascript to work, please enable Javascript in your browser or use a browser with Javascript support</noscript></body></html>

Why is this and how do I fix it? I already tried various libraries and methods for downloading files but all of them yielded this same "faulty" result.

I think the target url executes some javascript to provide the file. This script has to be interpreted (and executed) by some javascript engine.

So you need either some resolution to get the real file url (and not just the javascript) or integrate some javascript engine to execute the script code and get the result.

I think this could help you: Executing javascript in java - Opening a URL and getting links

or better:

http://www.java2s.com/Code/Java/JDK-6/ExecuteJavascriptscriptinafile.htm

我将网站托管服务商切换到了托管服务商,现在上面的代码按预期工作了。

http://bullywiiplaza.cuccfree.com/Test.txt doesn't exist. I think the url should be https://bullywiiplaza.cuccfree.com/Test.txt which exists.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM