简体   繁体   中英

Download a file process by server, with HtmlUnit

I want to automatize file conversion available at: https://www.gpsvisualizer.com/map_input?form=googleearth . My problem is that, gpsvisualizer allow standalone conversion, but I have 500 files to convert. So I used hmtlUnit to automatize the process.

Thank to the following code, I am able to modify "select" such as:

  1. "Output file type"
  2. "Add DEM elevation data"

upload my file and get the url of the redirected html page where I can download the wanted file.

My problem, is that I do not find a way to download the file.

Does any one have suggestion?

Thank, in advance.

Here is my code:

    WebClient webClient = new WebClient();
    webClient.getOptions().setCssEnabled(false);
    webClient.getOptions().setJavaScriptEnabled(true);
    webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
    webClient.getOptions().setThrowExceptionOnScriptError(false);
    webClient.getOptions().setRedirectEnabled(true);
    
    

    //fetching the web page
    String url = "https://www.gpsvisualizer.com/map_input?form=googleearth";
    //String url = "https://www.reddit.com/r/scraping/";
    HtmlPage page = webClient.getPage(url);
    
    System.out.println(page.getUrl());
    
    System.out.println(page.getTitleText());
    
    //Select set .kml file
    HtmlSelect selectFileType = (HtmlSelect) page.getElementByName("googleearth_zip");
    System.out.println(selectFileType.getOption(0).asText());
    //System.out.println(selectFileType.getOption(1).asText());
    
    HtmlOption kmlFile = selectFileType.getOptionByText(".kml (uncompressed)");
    System.out.println(kmlFile.asText());
    selectFileType.setSelectedAttribute(kmlFile, true);
    
    //Select add elevation on file
    HtmlSelect selectelevation = (HtmlSelect) page.getElementByName("add_elevation");
    System.out.println(selectelevation.getOption(4).asText());
    
    HtmlOption europeSRTM1 = selectelevation.getOptionByText("NASA SRTM1 (30m res., NoAm, Europe, more)");
    System.out.println(europeSRTM1.asText());
    selectelevation.setSelectedAttribute(europeSRTM1, true);
    
    //add file
    HtmlForm myForm = page.getFormByName("main");
    HtmlFileInput fileInput = myForm.getInputByName("uploaded_file_1");
    fileInput.setValueAttribute("/media/Stock/Projets/Suratram/Ressources/Traces_WS/puissance/kml_files/01_douce-signoret.kml");
    HtmlElement submitBtn = page.getElementByName("submitted");
    
    //page google
    HtmlPage page2 = submitBtn.click();
    System.out.println(page2.getUrl());

Because i have no sample file, i can only give some general advice

HtmlUnit is a bit strange about downloads - in general it works like this:

  • there is no download - every response is loaded into a window; HtmlUnit replaces the content of the current window or creates a new window with an UnknownPage making the content available as stream. The decision for a new window is done based on the content type (and some other factors eg target of an anchor). As a rule of thumb you can expect to have the download inside a new window if the real browser shows this download dialog.

What does it mean - i guess your page will return something that is detected as separate download by HtmlUnit. You can ask the WebClient for the available windows (webClient.getWebWindows()) and there might be a new one after the submit/click (maybe you have to add some wait if async js is part of the game). This new window will contain an UnknownPage as enclosedPage. And you can ask the unknown page for the response similar to this

Page newPage = newbWin.getEnclosedPage(); // UnknownPage inside window
WebResponse newResponse = newPage.getWebResponse();
try ...                            
    IOUtils.copy(newResponse.getContentAsStream(), outStream);
catch...

As an alternative you can implement an WebWindowListener (has to be registered at the client) to be informed if a new window gets created.

Hope that helps, if you need more please open an issue at github and provide your input file together with the code to let me reproduce your case.

Here is the answer of my problem.

Following the documentation of HtmlUnit, I had a problem trying to convert the downloading page to "Webwindow" object.

HtmlPage page = webClient.getPage(uri);
WebWindow window = page.getEnclosingWindow();

So finally, I do not need to convert it to "Webwindow". Just to parse "Anchors" to find mine and catch "webResponse" to get the procedded file.

You can find more detail at: https://github.com/HtmlUnit/htmlunit/issues/352

Thanks to RBRi for its help.

Best

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM