简体   繁体   English

X509证书异常,同时使用StormCrawler爬网一些URL

[英]X509 Certificate Exception while crawling some urls with StormCrawler

I have been using StormCrawler to crawl websites. 我一直在使用StormCrawler来爬行网站。 As https protocol, I set default https protocol in StormCrawler. 作为https协议,我在StormCrawler中设置默认的https协议。 However, when I crawl some websites I am receiving below exception: 但是,当我爬网某些网站时,出现以下异常:

Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141) ~[?:1.8.0_131]
at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126) ~[?:1.8.0_131]
at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280) ~[?:1.8.0_131]
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:382) ~[?:1.8.0_131]
at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292) ~[?:1.8.0_131]
at sun.security.validator.Validator.validate(Validator.java:260) ~[?:1.8.0_131]
at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324) ~[?:1.8.0_131]
at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229) ~[?:1.8.0_131]
at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124) ~[?:1.8.0_131]
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1496) ~[?:1.8.0_131]
... 20 more

Is there any mechanism to automatically download certificate and set the crawler and how should I set config of the crawler? 有什么机制可以自动下载证书并设置搜寻器,如何设置搜寻器的配置?

This problem is not specific to StormCrawler. 此问题并非特定于StormCrawler。 This answer explains that you can either import the certificates by hand which is not really an option unless you are crawling that site specifically. 此答案说明,您可以手动导入证书,这不是真正的选择,除非您要专门爬网该站点。 Another option is to disable certificate validation. 另一个选项是禁用证书验证。 This would require modifying the protocol implementation but should be doable. 这将需要修改协议实现,但应该可行。

Have you tried the OKHttp implementation? 您是否尝试过OKHttp实现? It might behave differently than the Apache HttClient one. 它的行为可能与Apache HttClient的行为不同。 See okhttp wiki . 参见okhttp wiki

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM