简体   繁体   English

如何尽快复制文件?

[英]How to copy files as fast as possible?

I am running my shell script on machineA which copies the files from machineB and machineC to machineA . 我在machineA上运行我的shell脚本,它将文件从machineBmachineC复制到machineA

If the file is not there in machineB , then it should be there in machineC for sure. 如果文件不在machineB ,那么它应该在machineC中肯定存在。 So I will try to copy from machineB first, if it is not there in machineB then I will go to machineC to copy the same files. 所以我会首先尝试从machineB复制,如果它不在machineB那么我将去machineC复制相同的文件。

In machineB and machineC there will be a folder like this YYYYMMDD inside this folder - machineBmachineC ,这个文件夹里面会有一个像YYYYMMDD这样的文件夹 -

/data/pe_t1_snapshot

So whatever date is the latest date in this format YYYYMMDD inside the above folder - I will pick that folder as the full path from where I need to start copying the files - 因此,无论日期是上述文件夹中YYYYMMDD格式的最新日期 - 我将选择该文件夹作为我需要开始复制文件的完整路径 -

so suppose if this is the latest date folder 20140317 inside /data/pe_t1_snapshot then this will be the full path for me - 所以想如果这是最新日期的文件夹20140317内部/data/pe_t1_snapshot那么这将是完整的路径对我来说-

/data/pe_t1_snapshot/20140317

from where I need to start copying the files in machineB and machineC . 我需要从哪里开始复制machineBmachineC的文件。 I need to copy around 400 files in machineA from machineB and machineC and each file size is 1.5 GB . 我需要从machineBmachineC复制machineA大约400文件,每个文件大小为1.5 GB

Currently I have my below shell script which works fine as I am using scp but somehow it takes ~ 2 hours to copy the 400 files in machineA which is too long for me I guess. 目前我有我的下面的shell脚本工作正常,因为我使用scp但不知何故,需要约2 hours来复制machineA中的400文件,这对我来说太长了,我想。 :( :(

Below is my shell script - 下面是我的shell脚本 -

#!/bin/bash

readonly PRIMARY=/export/home/david/dist/primary
readonly SECONDARY=/export/home/david/dist/secondary
readonly FILERS_LOCATION=(machineB machineC)
readonly MEMORY_MAPPED_LOCATION=/data/pe_t1_snapshot
PRIMARY_PARTITION=(0 3 5 7 9) # this will have more file numbers around 200
SECONDARY_PARTITION=(1 2 4 6 8) # this will have more file numbers around 200

dir1=$(ssh -o "StrictHostKeyChecking no" david@${FILERS_LOCATION[0]} ls -dt1 "$MEMORY_MAPPED_LOCATION"/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -n1)
dir2=$(ssh -o "StrictHostKeyChecking no" david@${FILERS_LOCATION[1]} ls -dt1 "$MEMORY_MAPPED_LOCATION"/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -n1)

echo $dir1
echo $dir2

if [ "$dir1" = "$dir2" ]
then
    # delete all the files first
    find "$PRIMARY" -mindepth 1 -delete
    for el in "${PRIMARY_PARTITION[@]}"
    do
        scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/.
    done

    # delete all the files first
    find "$SECONDARY" -mindepth 1 -delete
    for sl in "${SECONDARY_PARTITION[@]}"
    do
        scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/.
    done
fi

I am copying PRIMARY_PARTITION files in PRIMARY folder and SECONDARY_PARTITION files in SECONDARY folder in machineA . 我复制PRIMARY_PARTITION在文件PRIMARY文件夹和SECONDARY_PARTITION在文件SECONDARY文件夹machineA

Is there any way to move the files faster in machineA . 有没有办法在machineA更快地移动文件。 Can I copy 10 files at a time or 5 files at a time in parallel to speed up this process or any other approach? 我可以一次复制10个文件,也可以一次复制5个文件,以加快此过程或任何其他方法吗?

NOTE: machineA is running on SSD 注意: machineA正在SSD上运行

UPDATE:- 更新: -

Parallel Shell Script which I tried, top portion of shell script is same as shown above. 我试过的并行Shell脚本,shell脚本的顶部与上面显示的相同。

if [ "$dir1" = "$dir2" ] && [ "$length1" -gt 0 ] && [ "$length2" -gt 0 ]
then
    find "$PRIMARY" -mindepth 1 -delete
    for el in "${PRIMARY_PARTITION[@]}"
    do
        (scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/.) &
          WAITPID="$WAITPID $!"        
    done

    find "$SECONDARY" -mindepth 1 -delete
    for sl in "${SECONDARY_PARTITION[@]}"
    do
        (scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/.) &
          WAITPID="$WAITPID $!"        
    done
     wait $WAITPID
     echo "All files done copying."
fi

Errors I got with parallel shell script- 我用并行shell脚本得到的错误 -

channel 24: open failed: administratively prohibited: open failed
channel 25: open failed: administratively prohibited: open failed
channel 26: open failed: administratively prohibited: open failed
channel 28: open failed: administratively prohibited: open failed
channel 30: open failed: administratively prohibited: open failed
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
channel 32: open failed: administratively prohibited: open failed
channel 36: open failed: administratively prohibited: open failed
channel 37: open failed: administratively prohibited: open failed
channel 38: open failed: administratively prohibited: open failed
channel 40: open failed: administratively prohibited: open failed
channel 46: open failed: administratively prohibited: open failed
channel 47: open failed: administratively prohibited: open failed
channel 49: open failed: administratively prohibited: open failed
channel 52: open failed: administratively prohibited: open failed
channel 54: open failed: administratively prohibited: open failed
channel 55: open failed: administratively prohibited: open failed
channel 56: open failed: administratively prohibited: open failed
channel 57: open failed: administratively prohibited: open failed
channel 59: open failed: administratively prohibited: open failed
mux_client_request_session: session request failed: Session open refused by peer
channel 61: open failed: administratively prohibited: open failed
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
channel 64: open failed: administratively prohibited: open failed
mux_client_request_session: session request failed: Session open refused by peer
channel 68: open failed: administratively prohibited: open failed
channel 72: open failed: administratively prohibited: open failed
channel 74: open failed: administratively prohibited: open failed
channel 76: open failed: administratively prohibited: open failed
channel 78: open failed: administratively prohibited: open failed

you can try this command 你可以尝试这个命令

rsync

from the 来自

man rsync

you will see that: The rsync remote-update protocol allows rsync to transfer just the differences between two sets of files across the network connection, using an efficient checksum-search algorithm described in the technical report that accompanies this package. 您将看到:rsync远程更新协议允许rsync使用此软件包随附的技术报告中描述的高效校验和搜索算法,仅通过网络连接传输两组文件之间的差异。

You may try the HPN-SSH (High Performance SSH/SCP) - http://www.psc.edu/index.php/hpn-ssh or http://hpnssh.sourceforge.net/ 您可以尝试使用HPN-SSH(高性能SSH / SCP) - http://www.psc.edu/index.php/hpn-sshhttp://hpnssh.sourceforge.net/

The HPN-SSH project is the set of patches for OpenSSH (scp is part of it), to better tune various tcp and internal buffers. HPN-SSH项目是OpenSSH的一组补丁(scp是其中的一部分),可以更好地调整各种tcp和内部缓冲区。 There is also "none" cipher ("None Cipher Switching") which disables encryption, and this may help you too (if you don't use public networks to send the data). 还有“无”密码(“无密码交换”)禁用加密,这也可能对您有帮助(如果您不使用公共网络发送数据)。

Both compression and encryption consumes CPU time; 压缩和加密都会占用CPU时间; and 10 Gbit Ethernet sometimes may be faster to transfer uncompressed file then waiting CPU to compress and encrypt it. 有时,10 Gbit以太网可以更快地传输未压缩文件,然后等待CPU压缩和加密。

You may profile your setup: 您可以分析您的设置:

  • Measure the network bandwidth between machines using iperf or netperf . 使用iperfnetperf测量机器之间的网络带宽。 Compare with the actual network (network cards capabilities, switches). 与实际网络(网卡功能,交换机)进行比较。 With good setup you should get more than 80-90 percents of declared speed. 如果设置良好,您应该获得超过80-90%的声明速度。
  • Calculate data volume and the time needed to transfer so much data with your network using speed from iperf or netperf . 使用iperfnetperf速度计算数据量和使用网络传输如此多数据所需的时间。 Compare with actual transfer time, is there huge difference? 与实际转移时间相比,是否存在巨大差异?
    • If your CPU is fast, data is compressible and network is slow, compressing will help you. 如果您的CPU速度很快,数据是可压缩的,网络速度很慢,压缩将对您有所帮助。
  • Take a look on top , vmstat , iostat . 看看topvmstatiostat
    • Are there 100% loaded CPU cores (run top and press 1 to see cores)? 是否有100%加载的CPU核心(运行top并按1查看核心)?
    • Are there too much interrupts ( in ) in vmstat 1 ? vmstat 1是否有太多中断( in )? What about context switches ( cs )? 上下文切换( cs )怎么样?
    • What is file reading speed in iostat 1 ? iostat 1文件读取速度是多少? Are your HDDs are fast enough to read data; 你的硬盘驱动器是否足够快以读取数据; to write data on receiver? 在接收器上写数据?
  • You can try to do full-system profiling using perf top or perf record -a . 您可以尝试使用perf topperf record -a进行全系统分析。 Is there lot of computing by scp, or network stack in Linux? Linux中的scp或网络堆栈有很多计算机吗? If you can install dtrace or ktap , try to make also off-cpu profiling 如果您可以安装dtracektap ,请尝试进行off-cpu分析

You have 1.5 GB * 400 = 600 GB of data. 您有1.5 GB * 400 = 600 GB的数据。 Unrelated to the answer I suggest that the machine set up looks incorrect if you need to transfer this amount of data. 与答案无关我建议如果您需要传输此数据量,机器设置看起来不正确。 You probably needed to generate this data at machine A in the first place. 您可能首先需要在机器A上生成此数据。

There are 600 GB of data being transferred in 2 hours, that is ~ 85 MB/s transfer rate, which means you probably reached the transfer limits of either your disk drives or (almost) the network. 在2小时内传输600 GB数据,即〜85 MB / s传输速率,这意味着您可能已达到磁盘驱动器或(几乎)网络的传输限制。 I believe you won't be able to transfer faster with any other command. 我相信你将无法使用任何其他命令更快地转移。

If the machines are close to each other, the method of copying that I believe is the fastest is to physically remove the storage from machines B and C, put them in machine A and then locally copy them without transferring via the network. 如果机器彼此靠近,我认为最快的复制方法是从机器B和C中物理移除存储,将它们放入机器A,然后在本地复制它们而不通过网络传输。 The time for this is the time to move around the storage, plus disk transfer times. 这个时间是移动存储的时间,加上磁盘传输时间。 I'm afraid, however, the copy won't be much faster than 85 MB/s. 但是,我担心副本的速度不会比85 MB / s快。

The network transfer command that I believe would be the fastest one is netcat, because it has no overhead related to encryption. 我认为最快的网络传输命令是netcat,因为它没有与加密相关的开销。 Additionally, if the files are not media files, you have to compress them using a compressor that compresses faster than 85 MB/s. 此外,如果文件不是媒体文件,则必须使用压缩比压缩速度超过85 MB / s的压缩器进行压缩。 I know of lzop and lz4 that are granted to be faster than this rate. 我知道lzop和lz4被授予比这个速度更快的速度。 So my command line for transfering a single directory would be (BSD netcat syntax): 所以我转移单个目录的命令行是(BSD netcat语法):

machine A: 机器A:

$ nc -l 2000 | lzop -d | tar x

machine B or C (can be executed from machine A with the help of ssh): 机器B或C(可以在ssh的帮助下从机器A执行):

$ tar c directory | lzop | nc machineA 2000

Remove the compressor if transfering media files, which are already compressed. 如果传输已压缩的媒体文件,请删除压缩器。

The commands to organize your directory structure are irrelevant in terms of speed, so I didn't bother to write them here, but you can reuse your own code. 组织目录结构的命令在速度方面是无关紧要的,所以我不打算在这里写它们,但你可以重用自己的代码。

This is the fastest method I can think of, but, again, I don't believe this command will be much faster that what you already have. 这是我能想到的最快的方法,但是,我不相信这个命令会比你已经拥有的更快。

You definitely want to give rclone a try. 你肯定想试试rclone This thing is crazy fast : 这件事很疯狂:

sudo rclone sync /usr /home/fred/temp -P -L --transfers 64 sudo rclone sync / usr / home / fred / temp -P -L --transfers 64

Transferred: 17.929G / 17.929 GBytes, 100%, 165.692 MBytes/s, ETA 0s Errors: 75 (retrying may help) Checks: 691078 / 691078, 100% Transferred: 345539 / 345539, 100% Elapsed time: 1m50.8s 转让:17.929G / 17.929 GBytes,100%,165.692 MBytes / s,ETA 0s错误:75(重试可能有帮助)检查:691078 / 691078,100%转移:345539 ​​/ 345539,100%经过时间:1m50.8s

This is a local copy from and to a LITEONIT LCS-256 (256GB) SSD. 这是LITEONIT LCS-256(256GB)SSD的本地副本。

rsync optionally compresses its data. rsync可选择压缩其数据。 That typically makes the transfer go much faster. 这通常会使转移速度更快。

You didn't mention SCP, but SCP -C also compresses. 你没有提到SCP,但SCP -C也压缩了。

Do note that compression might make the transfer go faster or slower, depending upon the speed of your CPU and of your network link. 请注意,压缩可能会使传输速度变慢或变慢,具体取决于CPU和网络链接的速度。

Slower links and faster CPU make compression a good idea; 较慢的链接和更快的CPU使压缩成为一个好主意; faster links and slower CPU make compression a bad idea. 更快的链接和更慢的CPU使压缩成为一个坏主意。

As with any optimization, measure the results in your own environment. 与任何优化一样,在您自己的环境中测量结果。

Also I think ftp is another option for you, as my transfer speed test for large files (>10M) FTP work faster then SCP and even rsync (It's depended on file format and compression rate). 另外我认为ftp是另一种选择,因为我对大文件(> 10M)FTP的传输速度测试比SCP甚至rsync工作得更快(它取决于文件格式和压缩率)。

rsync is a good answer, but if you care about security then you should consider using: rsync是一个很好的答案,但如果您关心安全性,那么您应该考虑使用:

rdist

Some details on the differences between rsync and rdist can be found here: rdist vs rsync and a blog about how to set it up using ssh can be found here: non root remote updating 有关rsync和rdist之间差异的一些细节可以在这里找到: rdist vs rsync和一个关于如何使用ssh进行设置的博客可以在这里找到: 非root远程更新

Finally you could use the infamous tar pipe tar pattern, with a sprinkle of ssh. 最后你可以使用臭名昭着的tar管道tar模式,撒上ssh。

tar zcvf - /wwwdata | ssh root@dumpserver.nixcraft.in "cat > /backup/wwwdata.tar.gz"

This example is talked about here: tar copy over secure network 这里讨论这个例子: 通过安全网络进行tar拷贝

The remote doesn't support ssh multiplexing. 遥控器不支持ssh多路复用。

To silence the message: 要使消息保持沉默:

mux_client_request_session: session request failed: Session open refused by peer

Change your ~/.ssh/config file: 更改~/.ssh/config文件:

Host destination.hostname.com
  ControlMaster no

Host *
  ControlMaster auto
  ControlPersist yes
  ControlPath ~/.ssh/socket-%r@%h:%p

More details and notes can be found here . 更多细节和说明可以在这里找到。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM