I am developing a grails app using crawler4j.
I know this is an old question and I came across this solution here .
I tried the solution provided but am not sure where to keep the another fetcher and mockssl java files.
Also, I am not sure how these two classes would be called in case of urls containing https://...
Thanks in advance.
The solutions works fine. Maybe you have some problems to deduce where to put the code. Here is how I use it:
When creating the crawler, you will have something like this in your main class as showed in official documentation :
public class Controller {
public static void main(String[] args) throws Exception {
CrawlConfig config = new CrawlConfig();
config.setCrawlStorageFolder(crawlStorageFolder);
/*
* Instantiate the controller for this crawl.
*/
PageFetcher pageFetcher = new MockSSLSocketFactory(config);
RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);
....
Here you use the MockSSLSocketFactory that is defined as showed in the link you have posted:
public class MockSSLSocketFactory extends PageFetcher {
public MockSSLSocketFactory (CrawlConfig config) {
super(config);
if (config.isIncludeHttpsPages()) {
try {
httpClient.getConnectionManager().getSchemeRegistry().unregister("https");
httpClient.getConnectionManager().getSchemeRegistry()
.register(new Scheme("https", 443, new SimpleSSLSocketFactory()));
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
As you can see, here is using the class SimpleSSLSocketFactory. That can be defined as is shown in the example of the link:
public class SimpleSSLSocketFactory extends SSLSocketFactory {
public SimpleSSLSocketFactory() throws NoSuchAlgorithmException, KeyManagementException, KeyStoreException,
UnrecoverableKeyException {
super(trustStrategy, hostnameVerifier);
}
private static final X509HostnameVerifier hostnameVerifier = new X509HostnameVerifier() {
@Override
public void verify(String host, SSLSocket ssl) throws IOException {
// Do nothing
}
@Override
public void verify(String host, String[] cns, String[] subjectAlts) throws SSLException {
// Do nothing
}
@Override
public boolean verify(String s, SSLSession sslSession) {
return true;
}
@Override
public void verify(String arg0, java.security.cert.X509Certificate arg1) throws SSLException {
// TODO Auto-generated method stub
}
};
private static final TrustStrategy trustStrategy = new TrustStrategy() {
@Override
public boolean isTrusted(java.security.cert.X509Certificate[] arg0, String arg1) throws CertificateException {
return true;
}
};
}
As you can see, I am only copying code from the official documentation and the link you have posted, but I hope that seeing all together would be clearer for you.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.