简体   繁体   English

获取网页并保存在数据库中?

[英]Fetch a Web Page and save in Database?

如何获取HTML页面并将其保存到JAVA中的数据库?有没有简单的方法可以做到这一点?

Receiving a file over http is pretty easy using the URL class: 使用URL类通过http接收文件非常简单:

String rawHtml = IOUtils.toString(new URL("http://yahoo.com").openStream());

IOUtils is taken from org.apache.commons.io, the toString method reads the whole input stream into one String. IOUtils取自org.apache.commons.io,toString方法将整个输入流读入一个String。 Unfortunately by using java.net.URL you cannot control anything (cookies, header information, ..) besides the website's address :-/ Personally, I use this approach wherever I can since the HttpClient's API is too complex (too many LOC) to simply retrieve the source code of a website. 不幸的是,通过使用java.net.URL除了网站的地址之外你无法控制任何东西(cookie,标题信息......): - /就个人而言,我使用这种方法,因为HttpClient的API过于复杂(太多LOC)只需检索网站的源代码即可。

Not sure about your exact requirements. 不确定您的确切要求。

For something simple you can use HttpClient 对于简单的东西,你可以使用HttpClient

For something more complex, you can use Nutch It does crawling, indexing and searching as well. 对于更复杂的东西,你可以使用Nutch它也可以进行爬行,索引和搜索。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM