简体   繁体   English

Jsoup:获取HTML中未显示的链接

[英]Jsoup: Getting a link that doesn't show in the HTML

I am working on a little app for myself. 我正在为自己开发一个小应用程序。 I am trying to get a list of links from a site. 我正在尝试从网站获取链接列表。 The site is for example: http://kinox.to/Stream/Prison_Break.html 该网站例如: http : //kinox.to/Stream/Prison_Break.html
If you hover over the big window in the middle that says kinox.to best online, it show the link that I want in the bottom left. 如果您将鼠标悬停在中间的大窗口上,该窗口显示kinox.to至best online,则它会在左下角显示我想要的链接。 The problem is if I look at the html file I can't find the link anywhere. 问题是,如果我查看html文件,则在任何地方都找不到链接。 I guess it has to do something with the site using JavaScript or Ajax. 我想它与使用JavaScript或Ajax的网站有一定关系。
Is it possible to somehow get the link using JSoup or are there any other Java libraries that could help me? 是否可以通过JSoup以某种方式获取链接,或者是否有其他Java库可以帮助我?

I did not look closely into the page you try to load, but here is what I think the problem may be: The link is loaded/generated dynamically via JavaScript. 我没有仔细查看您尝试加载的页面,但是我认为这可能是问题所在:链接是通过JavaScript动态加载/生成的。 Jsoup does not run JavaScript, so therefore you can't find the link in the html. Jsoup无法运行JavaScript,因此您无法在html中找到链接。

Two possible solutions: 两种可能的解决方案:

1) Use something like selenium webdriver to access the content. 1)使用诸如Selenium Webdriver之类的内容来访问内容。 The Java bindings allow to remote control a real browser which should have no problems loading the page and running all scripts within. Java绑定允许远程控制真正的浏览器,该浏览器在加载页面和运行其中的所有脚本时都不会出现问题。 Solution 1 is simple to program, but runs slowly. 解决方案1易于编程,但是运行缓慢。 It may depend on an extern browser program which must be installed on the machine. 它可能取决于必须在计算机上安装的外部浏览器程序。 An alternative to webdriver is the JavaFx webkit engine in case you are on java 8. 如果您使用的是Java 8,则JavaFx webkit引擎是webdriver的替代方法。

2) Analyse the traffic and the JavaScript on the page and find out where the link comes from. 2)分析网页上的点击量和JavaScript,并找出链接的来源。 This may take a bit of time to find out, but when you succeed you can use Jsoup to get all the data you need. 这可能需要一些时间才能找到,但是一旦成功,您就可以使用Jsoup来获取所需的所有数据。 This solution should run much faster than solution 1. 此解决方案应比解决方案1运行得快得多。

One solution and probably the easiest would be to use Selenium: 一种解决方法,可能是最简单的方法是使用Selenium:

WebDriver driver = new FirefoxDriver();
driver.get("http://kinox.to/Stream/Prison_Break.html");
String mylink = driver.findElement(By.cssSelector("#AjaxStream > a")).getText();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM