简体   繁体   中英

How to get the main content of an article from HTML using boilerplate?

I am trying to get the main content of an article from an HTML using boilerpipe code.

Downloaded the latest jars from here .

I am trying to use the following code:

String article = "";
try {
    article = ArticleExtractor.INSTANCE.getText(url);   
    System.out.println("Article ++++ >>" + article);    
} catch (BoilerpipeProcessingException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

But this returns an empty string for every URL . Can anyone help me on this?

Have you tried to pass the HTML itself instead of the url? Or maybe there is a problem with the way your url strings are formatted.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM