简体   繁体   English

使用Jsoup从android应用程序的html文档中提取元素

[英]Extracting Element from html document for android app using Jsoup

I am trying to extract the text from this id to use in an android app I am trying to build. 我正在尝试从此ID中提取文本,以在要构建的Android应用中使用。

<div id="114561_435450">CSE423 - DMH - UB30301<br></div>

As I am using Jsoup Library I already tried using getElementById("114561_435450") and div[id=114561_435450].text() . 在使用Jsoup库时,我已经尝试使用getElementById("114561_435450")div[id=114561_435450].text() I am pretty much frustrated right now. 我现在很沮丧。 Please any kind of help is appreciated. 请任何帮助表示赞赏。 Thanks in advance. 提前致谢。

Using purely Javascript, the following should work: 使用纯Javascript,以下方法应该起作用:

getElementById("114561_435450").innerHTML

If you can use jQuery, the following should also work: 如果可以使用jQuery,则以下内容也应适用:

$("#114561_435450").html()

I see two possible reasons, why your code may not work 我看到两个可能的原因,为什么您的代码可能无法正常工作

  1. The id is changing with each request to the page. 该ID随页面的每个请求而变化。 This is easy to check for, just load the url again in a browser and see if the id changed. 这很容易检查,只需将URL再次加载到浏览器中,然后查看ID是否已更改。 do not forget to clear cache and cookies between tests. 不要忘记在测试之间清除缓存和cookie。 If the id changes indeed, you need to find out more about the structure of the document in order to find the correct div. 如果id确实发生了更改,则需要查找有关文档结构的更多信息,以便找到正确的div。

  2. The content of the document may be filled in by AJAX and thus not directly accessible to you. 文档的内容可能由AJAX填写,因此您不能直接访问。 If this is the case you can find out by looking at a) the url loaded via curl or print it out from the Jsoup document and b) the network traffic when the page loads (Developer tools in Chrome or Firefox). 在这种情况下,您可以查看以下内容:a)通过curl加载的url或从Jsoup文档中打印出来的URL,以及b)页面加载时的网络流量(Chrome或Firefox中的Developer tools)。 If this is the case you should find out the URL of the AJAX call and get this instead of the original url. 如果是这种情况,您应该找出AJAX调用的URL,并获取它而不是原始URL。

Another solution to your problem may be the use of Selenium Webdriver. 解决您的问题的另一种方法可能是使用Selenium Webdriver。 With this you actually remote control a real browser which should be perfectly able to execute any JavaScript that populates the DOM. 有了它,您实际上就可以控制一个真正的浏览器,该浏览器应该能够完美地执行填充DOM的所有JavaScript。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM