简体   繁体   English

通过服务器下载文件进程,使用 HtmlUnit

[英]Download a file process by server, with HtmlUnit

I want to automatize file conversion available at: https://www.gpsvisualizer.com/map_input?form=googleearth .我想自动化文件转换: https://www.gpsvisualizer.com/map_input?form=googleearth My problem is that, gpsvisualizer allow standalone conversion, but I have 500 files to convert.我的问题是,gpsvisualizer 允许独立转换,但我有 500 个文件要转换。 So I used hmtlUnit to automatize the process.所以我使用 hmtlUnit 来自动化这个过程。

Thank to the following code, I am able to modify "select" such as:感谢以下代码,我能够修改“选择”,例如:

  1. "Output file type" “输出文件类型”
  2. "Add DEM elevation data" “添加 DEM 高程数据”

upload my file and get the url of the redirected html page where I can download the wanted file.上传我的文件并获取重定向的 html 页面的 url 页面,我可以在其中下载想要的文件。

My problem, is that I do not find a way to download the file.我的问题是我找不到下载文件的方法。

Does any one have suggestion?有人有建议吗?

Thank, in advance.预先感谢。

Here is my code:这是我的代码:

    WebClient webClient = new WebClient();
    webClient.getOptions().setCssEnabled(false);
    webClient.getOptions().setJavaScriptEnabled(true);
    webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
    webClient.getOptions().setThrowExceptionOnScriptError(false);
    webClient.getOptions().setRedirectEnabled(true);
    
    

    //fetching the web page
    String url = "https://www.gpsvisualizer.com/map_input?form=googleearth";
    //String url = "https://www.reddit.com/r/scraping/";
    HtmlPage page = webClient.getPage(url);
    
    System.out.println(page.getUrl());
    
    System.out.println(page.getTitleText());
    
    //Select set .kml file
    HtmlSelect selectFileType = (HtmlSelect) page.getElementByName("googleearth_zip");
    System.out.println(selectFileType.getOption(0).asText());
    //System.out.println(selectFileType.getOption(1).asText());
    
    HtmlOption kmlFile = selectFileType.getOptionByText(".kml (uncompressed)");
    System.out.println(kmlFile.asText());
    selectFileType.setSelectedAttribute(kmlFile, true);
    
    //Select add elevation on file
    HtmlSelect selectelevation = (HtmlSelect) page.getElementByName("add_elevation");
    System.out.println(selectelevation.getOption(4).asText());
    
    HtmlOption europeSRTM1 = selectelevation.getOptionByText("NASA SRTM1 (30m res., NoAm, Europe, more)");
    System.out.println(europeSRTM1.asText());
    selectelevation.setSelectedAttribute(europeSRTM1, true);
    
    //add file
    HtmlForm myForm = page.getFormByName("main");
    HtmlFileInput fileInput = myForm.getInputByName("uploaded_file_1");
    fileInput.setValueAttribute("/media/Stock/Projets/Suratram/Ressources/Traces_WS/puissance/kml_files/01_douce-signoret.kml");
    HtmlElement submitBtn = page.getElementByName("submitted");
    
    //page google
    HtmlPage page2 = submitBtn.click();
    System.out.println(page2.getUrl());

Because i have no sample file, i can only give some general advice因为我没有样本文件,我只能给出一些一般性的建议

HtmlUnit is a bit strange about downloads - in general it works like this: HtmlUnit 对于下载有点奇怪——一般来说它是这样工作的:

  • there is no download - every response is loaded into a window;没有下载 - 每个响应都加载到 window 中; HtmlUnit replaces the content of the current window or creates a new window with an UnknownPage making the content available as stream. HtmlUnit 替换当前 window 的内容或使用 UnknownPage 创建新的 window,使内容可用作 stream。 The decision for a new window is done based on the content type (and some other factors eg target of an anchor).新 window 的决定是基于内容类型(以及其他一些因素,例如锚点的目标)。 As a rule of thumb you can expect to have the download inside a new window if the real browser shows this download dialog.根据经验,如果真正的浏览器显示此下载对话框,您可以期望在新的 window 中进行下载。

What does it mean - i guess your page will return something that is detected as separate download by HtmlUnit.这是什么意思-我猜您的页面将返回被 HtmlUnit 检测为单独下载的内容。 You can ask the WebClient for the available windows (webClient.getWebWindows()) and there might be a new one after the submit/click (maybe you have to add some wait if async js is part of the game).您可以向 WebClient 询问可用的 windows (webClient.getWebWindows()) 并且在提交/单击之后可能会有一个新的(如果异步 js 是游戏的一部分,您可能需要添加一些等待)。 This new window will contain an UnknownPage as enclosedPage.这个新的 window 将包含一个 UnknownPage 作为封闭页。 And you can ask the unknown page for the response similar to this并且可以向未知页面询问类似这样的回复

Page newPage = newbWin.getEnclosedPage(); // UnknownPage inside window
WebResponse newResponse = newPage.getWebResponse();
try ...                            
    IOUtils.copy(newResponse.getContentAsStream(), outStream);
catch...

As an alternative you can implement an WebWindowListener (has to be registered at the client) to be informed if a new window gets created.作为替代方案,您可以实现一个 WebWindowListener(必须在客户端注册),以便在创建新的 window 时收到通知。

Hope that helps, if you need more please open an issue at github and provide your input file together with the code to let me reproduce your case.希望对您有所帮助,如果您需要更多信息,请在 github 打开一个问题,并提供您的输入文件和代码,让我重现您的案例。

Here is the answer of my problem.这是我的问题的答案。

Following the documentation of HtmlUnit, I had a problem trying to convert the downloading page to "Webwindow" object.按照 HtmlUnit 的文档,我在尝试将下载页面转换为“Webwindow”object 时遇到问题。

HtmlPage page = webClient.getPage(uri);
WebWindow window = page.getEnclosingWindow();

So finally, I do not need to convert it to "Webwindow".所以最后,我不需要将它转换为“Webwindow”。 Just to parse "Anchors" to find mine and catch "webResponse" to get the procedded file.只是为了解析“Anchors”以找到我的并捕获“webResponse”以获取已处理的文件。

You can find more detail at: https://github.com/HtmlUnit/htmlunit/issues/352您可以在以下位置找到更多详细信息: https://github.com/HtmlUnit/htmlunit/issues/352

Thanks to RBRi for its help.感谢 RBRi 的帮助。

Best最好的

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM