简体   繁体   English

crawler4j的实现

[英]Implementation of crawler4j

I am attempting to get the basic form of crawler4j running as seen here .我正在尝试让 crawler4j 的基本形式运行,如此处所示 I have modified the first few lines by defining the rootFolder and numberOfCrawlers as follows:我通过定义 rootFolder 和 numberOfCrawlers 修改了前几行,如下所示:

public class BasicCrawlController {

    public static void main(String[] args) throws Exception {
            if (args.length != 2) {
                    System.out.println("Needed parameters: ");
                    System.out.println("\t rootFolder (it will contain intermediate crawl data)");
                    System.out.println("\t numberOfCralwers (number of concurrent threads)");
                    return;
            }

            /*
             * crawlStorageFolder is a folder where intermediate crawl data is
             * stored.
             */
             String crawlStorageFolder = args[0];

              args[0] = "/data/crawl/root";

            /*
             * numberOfCrawlers shows the number of concurrent threads that should
             * be initiated for crawling.
             */
            int numberOfCrawlers = Integer.parseInt(args[1]);

            args[1] = "7";


            CrawlConfig config = new CrawlConfig();

            config.setCrawlStorageFolder(crawlStorageFolder);

No matter how I seem to define it I still am receiving the error无论我如何定义它,我仍然收到错误

Needed parameters: 
 rootFolder (it will contain intermediate crawl data)
 numberOfCralwers (number of concurrent threads)

I think that I need to "set the paramaters in the Run Configurations" window but I do not know what that means.我认为我需要“在运行配置中设置参数”window 但我不知道那是什么意思。 How can I properly configure this basic crawler to get it up and running?我怎样才能正确配置这个基本的爬虫来启动和运行它?

After you compile the program with the javac keyword you need to run it by typing the following:使用 javac 关键字编译程序后,您需要键入以下命令来运行它:

java BasicCrawler Controller "arg1" "arg2" java 基本爬虫 Controller “arg1” “arg2”

The error is telling you that you aren't specifying arg[0] or arg[1] when you run the program.该错误告诉您在运行程序时没有指定 arg[0] 或 arg[1]。 Also, what is with this " args[1] = "7";"另外,这个“args[1] = “7”;”是什么意思? after you have already received the number of crawlers parameter?在您已经收到爬虫数量参数之后?

For what it looks like you are trying to do remove the first 5 lines because you are attempting to use hard coded values anyway.对于看起来您正在尝试删除前 5 行的内容,因为您无论如何都在尝试使用硬编码值。 Then set the crawlForStorage String to your directory path and the numberOfCrawlers to 7. Then you wouldn't have to specify command line parameters.然后将 crawlForStorage String 设置为您的目录路径,并将 numberOfCrawlers 设置为 7。这样您就不必指定命令行参数。 If you want to use command line parameters get rid of your hard coded values above and specify them at the CL如果你想使用命令行参数去掉上面的硬编码值并在 CL 中指定它们

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM