简体   繁体   中英

How to extract news content from a web page using Boilerpipe?

I need to extract main news content from a web page.I searched on inte.net and found an api named Boilerpipe freely available for that purpose http://boilerpipe-web.appspot.com/ But I'm not abled to find any implementations in java that make use of Boilerpipe.Can anyone tell me how can I use Boilerpipe in Java to extract the news content or give me some links to implementations in java which make use of Boilerpipe to extract content from a news web page?

may be my answer is too late. But it's pretty simple.

 URL url = new URL("http://www.nydailynews.com/sports/baseball"); 
 ArticleExtractor ae = new ArticleExtractor();
 String content = ae.getText(url);  // this contains the final text

simple huh, suppose you need to extract this URL

just use my BoilerPipe Alternative Web API HERE , my service is based on boilerpipe,i have developed this because of getting overquota error in the original application..you have the option to get back the result in JSON,just consume it in your application..

Best Regards

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM