简体   繁体   中英

The best way to get data from my blog

How is the best way to get data from own blog based on HTML?

I have simply blog with news. Each news is in div with class "news". I would like every hour to check on my application on Android whether news appeared. I dont want use RSS and XML.

How is the best way for this?

JSoup is the solution.

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.

jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.

I believe this will serve as a starting point:

String html = "<p>An <a href='http://example.com/'><b>example</b></a> link.</p>";
Document doc = Jsoup.parse(html);
Element link = doc.select("a").first();

String text = doc.body().text(); // "An example link"
String linkHref = link.attr("href"); // "http://example.com/"
String linkText = link.text(); // "example""

String linkOuterH = link.outerHtml(); 
    // "<a href="http://example.com"><b>example</b></a>"
String linkInnerH = link.html(); // "<b>example</b>"  

Update:
As suggested by hexafraction , you can use RSS. It is a format for regularly-changing delivering web content like news, etc. and is commonly used by many websites to help their users stay up to date. It delivers information about your content, like the title, description, link, etc., in an XML format that you can parse to display data to the user.

Writing an XML parser to parse RSS feed will be much easier than parsing the HTML using JSoup . This about.com article will help you with adding RSS to your website.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM