[英]Java URL library for grabbing lines on a website
I want to be able to grab N
lines (HTML text content that start on new lines) on a specific URL
eg www.sitename.com
and store them as strings
in an array. 我希望能够在特定
URL
例如www.sitename.com
)上获取N
行(以新行开头的HTML文本内容)并将它们作为strings
存储在数组中。
something like 就像是
public void grabLines(){
//create instance of class from imported library
//pass sitename into it
//from the instance, call a method for grabbing the lines on the site and pass in "N" as a parameter
//the method returns an array/list of N Strings that I can access later
}
Is there a native Java library I can import to do this? 是否可以导入本地Java库来执行此操作? Does it allow me do what I want easily?
它可以让我轻松完成自己想做的事情吗?
Thanks 谢谢
Are you trying to make a screen scraper? 您是否要制作刮板机? you will be pulling html as opposed to just what you see.
您将获取html而不是看到的内容。 also if the website is dynamic you won't be able to pull everything that you can see.
此外,如果网站是动态的,您将无法提取所有可见内容。 If you want just html and stuff you can try something like this.
如果您只想要html之类的东西,可以尝试这样的事情。 I tried to build a bloomberg screen scraper and then parse out the random html tags.
我试图构建一个Bloomberg屏幕抓取工具,然后解析出随机的html标签。
try {
URL bbg = new URL("http://www.bloomberg.com/markets/economic-calendar/");
BufferedReader r = new BufferedReader(new InputStreamReader( bbg.openStream()));
while( (temp = r.readLine())!= null){
System.out.println(temp);
}
} catch (Exception e){
e.printStackTrace();
}
Apache HttpClient是上述URL / Reader技术之上的抽象,但是类似: Apache HTTP Client
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.