简体   繁体   中英

Reading the page source inside <form> of a web page

Can any one help me to read the page source present inside the tag.

I have tried with htmlUnit and jsoup... but it retrns only the contents inside and tags. Any responce is highly appreciated.

Use element.html() to read the HTML and not the contain of tag itself in JSoup

For Example:

String html = "<p>An </p><form action="SOMESERVLET"><b>example</b></form> ";
Document doc = Jsoup.parse(html);
String htmlContent = doc.select("form").first().html();

For your case

Document doc = Jsoup.connect("example.com").get(); 
Iterator<Element> itr = doc.select("form").iterator()
while(itr.hasNext()){ 
   Element element = itr.next();
   System.out.println(element.html());
}

Step by step

  • read html from url to string
  • find <form> tag it is start index
  • find </form> tag it is last index , * if this tag is not present last index is length *
  • and just substring from start to end index

it is simple algorithm but I think there are a lot of tools that can help you!!!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM