简体   繁体   中英

Download file from ftp server while file is yet being uploaded

I need to automate the pull(get) of files from a big variety across different FTP services spread on different domains and that receive files on 24/7 basis.

My problem is that FTP services, in general, allow the download of a file while the file is yet being uploaded. This is one of the references to the problem that can be find at internet.

This can lead to incomplete file download.

I try replicate the situation using a windows server and a ftp FileZilla client and got half of the file as expected, so no safe mechanism was in place to prevent this. So maybe simple there is no way to prevent it from the client side.

So my question is if there is some anchor, something my client can test to check for sure that the ftp server already as the totality of the file.

I found hard to believe that a protocol has old as ftp don't provide safe mechanism, so i must be missing something, or this it is by design.

Update I am developing the automation in C#, but any technical tip can help. The solution need to bee fool prof because it is critical for the business.

update2 The upload are made by the many different clients, so it is impossible to establish a convention with all.

update3 This question is similar to question How to detect that a file is being uploaded over FTP , but has the additional restriction presented at update2.

I created the following automated solution based on inputs from answers at this post and others too, to address my problem as it is, meaning: Pull files from different FTP servers, from different brands,in a scenario where concurrency is much like to happen.

Using signal files or other mechanisms suggest in this post would require force clients to change the way they interact with us, so it is a solution for most cases but not a solution for my particular problem.

So, my solution was:

  1. scan the folder parsing filename, data and size of each file.
  2. discard any file that is too new. Only if file date is older than a few minutes it is considered for download. Hangs may cause this rule to fail preventing concurrency.
  3. Rename the file. It it fails, jump out. This method, based on concurrency, has proven to be 100% accurate so far.
  4. download the renamed file.
  5. check size of transfer and see if match the size attribute (paranoia check)
  6. delete the successful transferred file from the ftp server.

This solution allow us to poll ftp folders intensively.

I believe that from the client side, there's not much you can do.

At most, you could re-check the file size after some time and see whether it had changed and take whatever steps are required to get the new content.

FTP was not a designed as a protocol for kind of real time exchange of data between two clients using the FTP server. There is no kind of notification to a client if a file intended for download is still uploaded nor is their any indication when overwriting a file that somebody currently downloads this file. This is not a design error in the FTP protocol. The real problem is that you are trying to use a protocol for a purpose it was not designed for.

So you have this scenario:

[Publisher] --uploads file--> [FTP Server] --downloads file--> [You]

You have a publisher who is uploading files to an FTP server, and you download from the same FTP server. There can also be different FTP Server instances, one for upload and one for download, looking at the same directory, but that doesn't change much.

Now because you're looking at the same directory, you, the downloader, see files as soon as the filesystem entry is created - when the first bytes from the publisher may even still be in flight.

There are basically three solutions for this:

  • Sentinel files, written by the FTP server or a plugin. Either a "$originalFileName.lock" that exists while the file is being uploaded, or a "$originalFileName.done", that is written when the upload successfully completes.
  • Moving files to different directories: the FTP server moves the files from the upload directory where the publisher writes to the download directory from which you read.
  • The least stable: check for file size and time. When you start a download, remember the timestamp and size of the file that the FTP server reports. When you're done downloading the file, compare your values against the remembered ones. When they don't match, resume the donwload from where you finished to obtain the remaining bytes, ad inifitum. You can for example determine "A file is successfully uploaded if it hasn't grown in size for five minutes" , but that's not very robust - and can cause you to wait five minutes for nothing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM