简体   繁体   中英

Retrieve information from a website without APIs

I'm developing an Android application, which is programmed in Java. I've worked with APIs before, however I was wondering if it's possible to retrieve information without an API.

For example, trying to retrieve the hours of operation found here . If I click inspect element on the hours of operation, I can see that they are found under a heading called "Hours". Can I use these HTML tags to my advantage? Ie grab that heading/paragraph, parse it, and retrieve the needed results.

Thank you!

PS. Apologies for the newbie question, I wasn't sure how to properly word it so that I can receive relevant Google results.

I'd like to add something to comment by @Luciano Rodríguez.

As you know you can read content of HTTP response and then parse it as HTML. As you mentioned you can access specific element and get its value.

Now the problem is how generic your application should be. If for example you want to get a couple of fields from one specific site you already have all tools. Get HTML, parse it using one of available HTML parsers to extract data and you are done.

However if you are building generic application that should support various sites and be configurable you have a problem.

First, you have to be able to extract any data from DOM. I am personally working now on similar task. I used HTMLCleaner to get DOM from HTML and XPath to configure interesting nodes. But it is not all. Modern sites are very dynamic. Very often the information is not generated at server site as HTML but is built dynamically by javascript running on client site. It is not simple to support such case. Generally on the top of my head there can be 2 approaches:

  1. Use fully functional headless browser that will play the web application and create DOM. Then you can get data from DOM using XPath
  2. Get data from the source, eg HTML, XML, JSON etc. This approach requires additional configuration for each site you want to support.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM