简体   繁体   English

具有网址的javascript数组的网络搜寻器?

[英]Web crawler with javascript array of URLs?

I have users input their list of URLs that they want to crawl for the most recent updates. 我让用户输入他们要爬网以获取最新更新的URL列表。 I'm familiar with Java, so I was trying to make a web crawler in Java, but I can't quite figure out how to get this array from javascript to Java. 我对Java很熟悉,因此我试图用Java制作网络爬虫,但是我还不太清楚如何将数组从javascript转换为Java。 What would be the most compatible language so that each of the URLs in the array could be crawled? 什么是最兼容的语言,以便可以对数组中的每个URL进行爬网?

Based on my understanding, the only format to send an array from javascript is JSON. 根据我的理解,从javascript发送数组的唯一格式是JSON。 You can then parse the JSON on java side. 然后,您可以在Java端解析JSON。

Refer to this link How to parse JSON in Java 参考此链接如何在Java中解析JSON

For the web crawler in java you can refer to this link . 对于Java中的Web搜寻器,您可以参考此链接

Unfortunately Java doesn't come with all of the tools to make an HTTP request and parse the page in a super easy way. 不幸的是,Java并未提供所有工具来发出HTTP请求并以一种非常简单的方式解析页面。 Fortunately there's a really lightweight and super easy to use package called jsoup that makes this very easy. 幸运的是,有一个名为jsoup的真正轻量级且易于使用的软件包,它使此操作非常容易。

There's about 700 lines of code to form the HTTP request and the response, and a few thousand lines of code to parse the response. 大约有700行代码来形成HTTP请求和响应,还有几千行代码来解析响应。 But because this is all neatly bundled up in this package for us, we just have to write a few lines of code ourselves. 但是因为所有这些内容都整齐地打包在了我们的软件包中,所以我们只需要自己编写几行代码即可。

So there you go. 所以你去了。 Hope it helps! 希望能帮助到你!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM