简体   繁体   English

多个线程从sftp服务器下载相同文件

[英]Multiple Threads downloading same file from sftp server

I have a system that, when files of a certain type are found, I download, encode, and upload them in a separate thread. 我有一个系统,当找到某种类型的文件时,我会在一个单独的线程中进行下载,编码和上传。

while(true) {
    for(SftpClient c : clients) {
        try {
            filenames = c.list("*.wav", "_rdy_");
        } catch (SftpException e) {
            e.printStackTrace();
        }
        if(filenames.size() > 0) {
            //AudioThread run() method handles the download, encode, and upload
            AudioThread at = new AudioThread(filenames);
            at.setNode(c.getNode());
            Thread t = new Thread(at);
            t.start();
        }
    }
    try {
        Thread.sleep(3000);
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
}

The run method from AudioThread AudioThread的run方法

public void run() {
    System.out.println("Running...");
    this.buildAsteriskMapping();
    this.connectToSFTP();
    ac = new AudioConvert();
    this.connectToS3();

    String downloadDir = "_rough/" + getNode() + "/" + Time.getYYYYMMDDDate() + "/";
    String encodeDir = "_completed" + getNode() + "/" + Time.getYYYYMMDDDate() + "/";
    String uploadDir = getNode() + "/" + Time.getYYYYMMDDDate() + "/";

    System.out.println("Downloading...");
    try {
        sftp.get(filenames, downloadDir);
    } catch (SftpException e) {
        //download failed
        System.out.println("DL Failed...");
        e.printStackTrace();
    }

    System.out.println("Encoding...");
    try {
        ac.encodeWavToMP3(filenames, downloadDir, encodeDir);
    } catch (IllegalArgumentException | EncoderException e) {
        System.out.println("En Failed...");
        e.printStackTrace();
    }

    System.out.println("Uploading...");
    try {
        s3.upload(filenames, encodeDir, uploadDir);
    } catch (AmazonClientException e) {
        System.out.println("Up Failed...");
        e.printStackTrace();
    }

}

The download method: 下载方法:

public void get(ArrayList<String> src, String dest) throws SftpException {
    for(String file : src) {
        System.out.println(dest + file);
        channel.get(file, dest + file);
    }
}

The encode method: 编码方法:

public void encodeWavToMP3(ArrayList<String> filenames, String downloadDir, String encodeDir) throws IllegalArgumentException, EncoderException {
    for(String f : filenames) {
        File wav = new File(downloadDir + f);
        File mp3 = new File(encodeDir + wav.getName().replace(".wav", ".mp3"));
        encoder.encode(wav, mp3, attrs);
    }
}

The upload method: 上传方法:

public void upload(ArrayList<String> filenames, String encodeDir, String uploadDir)  throws AmazonClientException, AmazonServiceException {
    for(String f : filenames) {
        s3.putObject(new PutObjectRequest(bucketName, uploadDir, new File(encodeDir + f)));
    }
}

The issue is I keep downloading the same files (or about the same files) for every thread. 问题是我不断为每个线程下载相同的文件(或大约相同的文件)。 I want to add a variable for each client that holds the files that are being downloaded but I don't know how to remove the lists/filenames from this variable. 我想为每个客户端添加一个变量,以保存正在下载的文件,但是我不知道如何从该变量中删除列表/文件名。 What would be a solution? 有什么解决办法? My boss would also like to only allow x amount of threads to run. 我的老板还希望只允许运行x个线程。

It's kind of hard to see the problem, as the code that actually does the download is missing :P 很难看到问题所在,因为实际执行下载的代码丢失了:P

However, I would use some kind of ExecutorService instead. 但是,我将改用某种ExecutorService

Basically, I would add each download request to the service (wrapped in a "DownloadTask" with a reference to the file to be downloaded and any other relevant information it might need to get the file) and let the service take care of the rest. 基本上,我会将每个下载请求添加到服务中(包装在“ DownloadTask”中,其中包含对要下载的文件的引用以及可能需要获取该文件的任何其他相关信息),然后让服务处理其余的工作。

The download tasks could be coded to take into account existing files as you see fit. 可以对下载任务进行编码,以考虑到您认为合适的现有文件。

Depending on your requirements, this could be a single thread or multi-threaded service. 根据您的要求,它可以是单线程或多线程服务。 It could also allow you to place upload quests in it as well. 它还可以让您在其中放置上传任务。

Check out the Executors trail for more info 查看Executors追踪以获取更多信息

The general idea is to use a kind of producer/consumer pattern. 总体思路是使用一种生产者/消费者模式。 You would have (at least) a thread that would look up all the files to be downloaded and for each file, you would add it to the executor service. 您将拥有(至少)一个线程,该线程将查找所有要下载的文件,并且对于每个文件,您都将其添加到执行程序服务中。 After the file has been downloaded, I would queue and upload request into the same service. 下载文件后,我将排队并将请求上传到同一服务中。

This way, you avoid all the mess with synchronization and thread management :D 这样,您就避免了同步和线程管理带来的麻烦:D

You could use the same idea with the scan tasks, for each client, you could a task to a separate service 您可以对扫描任务使用相同的想法,对于每个客户端,您可以将任务分配给单独的服务

There is a problem in your code where you instantiate AudioThread in a while loop. 您的代码中存在一个问题,您需要在while循环中实例化AudioThread。

Note that after you create a thread and do a t.start(), all downloading, encoding and uploading happens asynchronously. 请注意,创建线程并执行t.start()之后,所有下载,编码和上传都是异步进行的。 Therefore, after you start the thread the loop continuous to do another call to c.list(...) while the first thread you created is still processing the first set of files. 因此,在启动线程后,循环将继续进行,以在创建的第一个线程仍在处理第一组文件时再次调用c.list(...)。 Most probably the same set of files is returned in the succeeding c.list() calls since you specified a file pattern in the call and there is no code which marks which files are currently being processed. 由于您在调用中指定了文件模式,并且随后的c.list()调用中很可能返回了相同的文件集,并且没有代码标记当前正在处理哪些文件。

My suggestion: 我的建议:

  • Use Executors.newFixedThreadPool(int nThreads) as mentioned in previous post. 使用上一篇文章中提到的Executors.newFixedThreadPool(int nThreads)。 And specify the number of threads to the number of processors in your machine. 并根据您计算机中处理器的数量指定线程数。 Do this before your while loop. 在while循环之前执行此操作。
  • For each filename you retrieved from ftp s.list(), create a Callable class and call ExecutorService.invokeAll(Collection<Callable<T>> tasks). 对于从ftp s.list()检索到的每个文件名,创建一个Callable类并调用ExecutorService.invokeAll(Collection <Callable <T >>任务)。 The code in the Callable you will create is your AudioThread code. 您将在Callable中创建的代码是您的AudioThread代码。 Modify AudioThread code to only process one file at at time (if possible), this way you are doing downloads,uploads, encoding in parallel for each file. 修改AudioThread代码以一次仅处理一个文件(如果可能),这样您就可以对每个文件进行并行的下载,上传和编码。
  • Add code which marks which files were already processed. 添加标记已处理哪些文件的代码。 I would suggest adding a code which renames the files you have processed to a different name to avoid getting returned in the next c.list() call. 我建议添加一个代码,将已处理的文件重命名为其他名称,以避免在下一个c.list()调用中返回。
  • Call ExecutorService.shutdown(...) after your while loop block 在while循环块之后调用ExecutorService.shutdown(...)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM