如何使用样板从HTML获取文章的主要内容？

Question

I am trying to get the main content of an article from an HTML using boilerpipe code. 我正在尝试使用样板代码从HTML获取文章的主要内容。

Downloaded the latest jars from here . 从这里下载最新的罐子。

I am trying to use the following code: 我正在尝试使用以下代码：

String article = "";
try {
    article = ArticleExtractor.INSTANCE.getText(url);   
    System.out.println("Article ++++ >>" + article);    
} catch (BoilerpipeProcessingException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

But this returns an empty string for every URL . 但这会为每个URL返回一个空字符串 。 Can anyone help me on this? 谁可以帮我这个事？

Answer 1

Have you tried to pass the HTML itself instead of the url? 您是否尝试过传递HTML本身而不是URL？ Or maybe there is a problem with the way your url strings are formatted. 也许您的url字符串格式设置方式存在问题。

如何使用样板从HTML获取文章的主要内容？

问题描述

1 个解决方案

解决方案1
2 已采纳 2016-10-10 07:18:31

如何使用样板从HTML获取文章的主要内容？

问题描述

1 个解决方案

解决方案1 2 已采纳 2016-10-10 07:18:31

解决方案1
2 已采纳 2016-10-10 07:18:31