简体   繁体   中英

How do I transfer effectively huge number of files from my python clients to server and back?

I have around 100 clients (Windows machines mostly with one or two Macs/Ubuntas) and I need to sync huge number of files between clients by means of central server which does almost no work on synced files (managing access rights mostly).

For now I see two solutions available:

  1. Use XML-RPC. Looks great but I'm not sure about performance. From that I googled performance of this approach is subpar.

  2. Use paramiko and copy files by ftp of scp. I don't like that solution because I'm storing files within riak and it would be a double i/o work on the server side: first, write file to the disk and second read file from disk and finally write it to the riak.

Is there a third approach like using sockets and writing file transferring code myself? Is there asynchronous XML-RPC server and do I need one for my task?

Operations during file transfer:

  1. Authentication of uploading user

  2. Checking user's disk quota

  3. Rules based access rights managment (who can read/write each files/directories).

  4. Placing files in riak because certain level of fault tolerance needed.

As I see it this application is actually to be closer to dropbox than to rsync. We'd actually use dropbox api but this storage is to be integrated deep with our other systems so we wanted to have more control over it.

The first thing in my mind when you say "sync huge number of files" is rsync. In case you don't know that tool, it allows you to sync directories efficiently, both local and remote. It can be configured to skip things that are unchanged, making it very efficient.

Now, when you say that the server "does almost no work on synced files", what is "almost"? If there is nothing to do on the files, you can use rsync. If there actually is some heavy computation on the files, the cost of these will probably dwarf the cost of transferring, so the IO is not your bottleneck and you can use any tool for it without degrading performance.

Now, if you can mirror the files on the server and apply the various modifications there, you could then use rsync to transfer them efficiently. This would allow you to not reinvent the file-transfer-wheel but instead to build on proven infrastructure. I must stress here that I don't understand from your description what exactly it is that you are doing though, maybe if you described the requirements a bit more, there would be a better or different answer.

Edit according to the updated question:

There are Python rsync bindings that should allow you to sync access even from MS Windows systems. It doesn't mention OS X, but since that is rather close to POISX, chances are high that it works without too much hassle. On the server side, you just monitor the local filesystem for changes (check out eg iwatch ) and then commit the differences to your DB. Using these two should get you started, if the performance later on doesn't suffice, you could hook into the rsync server (open source) and trigger DB updates from there without going through the filesystem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM