I'm using Apache HttpComponents to GET some web pages for some crawled URLs. Many of those URLs actually redirect to different URLs (eg because they have been processed with a URL shortener). Additionally to downloading the content, I would like to resolve the final URLs (ie the URL which provided the downloaded content), or even better, all URLs in the redirect chain.
I have been looking through the API docs, but got no clue, where I could hook. Any hints would be greatly appreciated.
一种方法是通过设置相关参数来关闭自动重定向处理,并通过检查3xx响应自行完成操作,然后从响应“ Location”标头中手动提取重定向位置。
Here's a full demo of how to do it using Apache HttpComponents.
You'll need to extend DefaultRedirectStrategy
like so:
class SpyStrategy extends DefaultRedirectStrategy {
public final Deque<URI> history = new LinkedList<>();
public SpyStrategy(URI uri) {
history.push(uri);
}
@Override
public HttpUriRequest getRedirect(
HttpRequest request,
HttpResponse response,
HttpContext context) throws ProtocolException {
HttpUriRequest redirect = super.getRedirect(request, response, context);
history.push(redirect.getURI());
return redirect;
}
}
expand
method sends a HEAD request which causes client
to collect URIs in spy.history
deque as it follows redirects automatically:
public static Deque<URI> expand(String uri) {
try {
HttpHead head = new HttpHead(uri);
SpyStrategy spy = new SpyStrategy(head.getURI());
DefaultHttpClient client = new DefaultHttpClient();
client.setRedirectStrategy(spy);
// FIXME: the following completely ignores HTTP errors:
client.execute(head);
return spy.history;
}
catch (IOException e) {
throw new RuntimeException(e);
}
}
You may want to set maximum number of redirects followed to something reasonable (instead of the default of 100) like so:
BasicHttpParams params = new BasicHttpParams();
params.setIntParameter(ClientPNames.MAX_REDIRECTS, 5);
DefaultHttpClient client = new DefaultHttpClient(params);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.