I extracted data from blogs using article extractor which returns articles in a string format. Since some pages have sub-links that go into news content I want that data to be extracted too. So, how can I access the data that is inside the sub-links? My code is this:
String news =" ";
try
{
URL url;
url = new URL("http://www.firstpost.com/tag/crime-in-india");
InputSource is = HTMLFetcher.fetch(url).toInputSource();
BoilerpipeSAXInput in = new BoilerpipeSAXInput(is);
TextDocument doc = in.getTextDocument();
news = ArticleExtractor.INSTANCE.getText(doc);
}
import net.sf.json.xml.XMLSerializer;
XMLSerializer xmlSerializer = new XMLSerializer();
JSON json = xmlSerializer.read( news );
Check your library imports in your build path - especially in Eclipse
I had this issue with 2 separate projects and it turned out I had older version libraries of net.sf.json in the json-lib-2.4-jdk15.jar (had older versions as well)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.