简体   繁体   中英

Web crawler with javascript array of URLs?

I have users input their list of URLs that they want to crawl for the most recent updates. I'm familiar with Java, so I was trying to make a web crawler in Java, but I can't quite figure out how to get this array from javascript to Java. What would be the most compatible language so that each of the URLs in the array could be crawled?

Based on my understanding, the only format to send an array from javascript is JSON. You can then parse the JSON on java side.

Refer to this link How to parse JSON in Java

For the web crawler in java you can refer to this link .

Unfortunately Java doesn't come with all of the tools to make an HTTP request and parse the page in a super easy way. Fortunately there's a really lightweight and super easy to use package called jsoup that makes this very easy.

There's about 700 lines of code to form the HTTP request and the response, and a few thousand lines of code to parse the response. But because this is all neatly bundled up in this package for us, we just have to write a few lines of code ourselves.

So there you go. Hope it helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM