[英]How to copy files as fast as possible?
I am running my shell script on machineA
which copies the files from machineB
and machineC
to machineA
. 我在machineA
上运行我的shell脚本,它将文件从machineB
和machineC
复制到machineA
。
If the file is not there in machineB
, then it should be there in machineC
for sure. 如果文件不在machineB
,那么它应该在machineC
中肯定存在。 So I will try to copy from machineB
first, if it is not there in machineB
then I will go to machineC
to copy the same files. 所以我会首先尝试从machineB
复制,如果它不在machineB
那么我将去machineC
复制相同的文件。
In machineB
and machineC
there will be a folder like this YYYYMMDD
inside this folder - 在machineB
和machineC
,这个文件夹里面会有一个像YYYYMMDD
这样的文件夹 -
/data/pe_t1_snapshot
So whatever date is the latest date in this format YYYYMMDD
inside the above folder - I will pick that folder as the full path from where I need to start copying the files - 因此,无论日期是上述文件夹中YYYYMMDD
格式的最新日期 - 我将选择该文件夹作为我需要开始复制文件的完整路径 -
so suppose if this is the latest date folder 20140317
inside /data/pe_t1_snapshot
then this will be the full path for me - 所以想如果这是最新日期的文件夹20140317
内部/data/pe_t1_snapshot
那么这将是完整的路径对我来说-
/data/pe_t1_snapshot/20140317
from where I need to start copying the files in machineB
and machineC
. 我需要从哪里开始复制machineB
和machineC
的文件。 I need to copy around 400
files in machineA
from machineB
and machineC
and each file size is 1.5 GB
. 我需要从machineB
和machineC
复制machineA
大约400
文件,每个文件大小为1.5 GB
。
Currently I have my below shell script which works fine as I am using scp
but somehow it takes ~ 2 hours
to copy the 400
files in machineA which is too long for me I guess. 目前我有我的下面的shell脚本工作正常,因为我使用scp
但不知何故,需要约2 hours
来复制machineA中的400
文件,这对我来说太长了,我想。 :( :(
Below is my shell script - 下面是我的shell脚本 -
#!/bin/bash
readonly PRIMARY=/export/home/david/dist/primary
readonly SECONDARY=/export/home/david/dist/secondary
readonly FILERS_LOCATION=(machineB machineC)
readonly MEMORY_MAPPED_LOCATION=/data/pe_t1_snapshot
PRIMARY_PARTITION=(0 3 5 7 9) # this will have more file numbers around 200
SECONDARY_PARTITION=(1 2 4 6 8) # this will have more file numbers around 200
dir1=$(ssh -o "StrictHostKeyChecking no" david@${FILERS_LOCATION[0]} ls -dt1 "$MEMORY_MAPPED_LOCATION"/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -n1)
dir2=$(ssh -o "StrictHostKeyChecking no" david@${FILERS_LOCATION[1]} ls -dt1 "$MEMORY_MAPPED_LOCATION"/[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] | head -n1)
echo $dir1
echo $dir2
if [ "$dir1" = "$dir2" ]
then
# delete all the files first
find "$PRIMARY" -mindepth 1 -delete
for el in "${PRIMARY_PARTITION[@]}"
do
scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/.
done
# delete all the files first
find "$SECONDARY" -mindepth 1 -delete
for sl in "${SECONDARY_PARTITION[@]}"
do
scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/.
done
fi
I am copying PRIMARY_PARTITION
files in PRIMARY
folder and SECONDARY_PARTITION
files in SECONDARY
folder in machineA
. 我复制PRIMARY_PARTITION
在文件PRIMARY
文件夹和SECONDARY_PARTITION
在文件SECONDARY
文件夹machineA
。
Is there any way to move the files faster in machineA
. 有没有办法在machineA
更快地移动文件。 Can I copy 10 files at a time or 5 files at a time in parallel to speed up this process or any other approach? 我可以一次复制10个文件,也可以一次复制5个文件,以加快此过程或任何其他方法吗?
NOTE: machineA
is running on SSD
注意: machineA
正在SSD
上运行
UPDATE:- 更新: -
Parallel Shell Script which I tried, top portion of shell script is same as shown above. 我试过的并行Shell脚本,shell脚本的顶部与上面显示的相同。
if [ "$dir1" = "$dir2" ] && [ "$length1" -gt 0 ] && [ "$length2" -gt 0 ]
then
find "$PRIMARY" -mindepth 1 -delete
for el in "${PRIMARY_PARTITION[@]}"
do
(scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$el"_200003_5.data $PRIMARY/.) &
WAITPID="$WAITPID $!"
done
find "$SECONDARY" -mindepth 1 -delete
for sl in "${SECONDARY_PARTITION[@]}"
do
(scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir2/t1_weekly_1680_"$sl"_200003_5.data $SECONDARY/.) &
WAITPID="$WAITPID $!"
done
wait $WAITPID
echo "All files done copying."
fi
Errors I got with parallel shell script- 我用并行shell脚本得到的错误 -
channel 24: open failed: administratively prohibited: open failed
channel 25: open failed: administratively prohibited: open failed
channel 26: open failed: administratively prohibited: open failed
channel 28: open failed: administratively prohibited: open failed
channel 30: open failed: administratively prohibited: open failed
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
channel 32: open failed: administratively prohibited: open failed
channel 36: open failed: administratively prohibited: open failed
channel 37: open failed: administratively prohibited: open failed
channel 38: open failed: administratively prohibited: open failed
channel 40: open failed: administratively prohibited: open failed
channel 46: open failed: administratively prohibited: open failed
channel 47: open failed: administratively prohibited: open failed
channel 49: open failed: administratively prohibited: open failed
channel 52: open failed: administratively prohibited: open failed
channel 54: open failed: administratively prohibited: open failed
channel 55: open failed: administratively prohibited: open failed
channel 56: open failed: administratively prohibited: open failed
channel 57: open failed: administratively prohibited: open failed
channel 59: open failed: administratively prohibited: open failed
mux_client_request_session: session request failed: Session open refused by peer
channel 61: open failed: administratively prohibited: open failed
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
mux_client_request_session: session request failed: Session open refused by peer
channel 64: open failed: administratively prohibited: open failed
mux_client_request_session: session request failed: Session open refused by peer
channel 68: open failed: administratively prohibited: open failed
channel 72: open failed: administratively prohibited: open failed
channel 74: open failed: administratively prohibited: open failed
channel 76: open failed: administratively prohibited: open failed
channel 78: open failed: administratively prohibited: open failed
you can try this command 你可以尝试这个命令
rsync
from the 来自
man rsync
you will see that: The rsync remote-update protocol allows rsync to transfer just the differences between two sets of files across the network connection, using an efficient checksum-search algorithm described in the technical report that accompanies this package. 您将看到:rsync远程更新协议允许rsync使用此软件包随附的技术报告中描述的高效校验和搜索算法,仅通过网络连接传输两组文件之间的差异。
You may try the HPN-SSH (High Performance SSH/SCP) - http://www.psc.edu/index.php/hpn-ssh or http://hpnssh.sourceforge.net/ 您可以尝试使用HPN-SSH(高性能SSH / SCP) - http://www.psc.edu/index.php/hpn-ssh或http://hpnssh.sourceforge.net/
The HPN-SSH project is the set of patches for OpenSSH (scp is part of it), to better tune various tcp and internal buffers. HPN-SSH项目是OpenSSH的一组补丁(scp是其中的一部分),可以更好地调整各种tcp和内部缓冲区。 There is also "none" cipher ("None Cipher Switching") which disables encryption, and this may help you too (if you don't use public networks to send the data). 还有“无”密码(“无密码交换”)禁用加密,这也可能对您有帮助(如果您不使用公共网络发送数据)。
Both compression and encryption consumes CPU time; 压缩和加密都会占用CPU时间; and 10 Gbit Ethernet sometimes may be faster to transfer uncompressed file then waiting CPU to compress and encrypt it. 有时,10 Gbit以太网可以更快地传输未压缩文件,然后等待CPU压缩和加密。
You may profile your setup: 您可以分析您的设置:
iperf
or netperf
. 使用iperf
或netperf
测量机器之间的网络带宽。 Compare with the actual network (network cards capabilities, switches). 与实际网络(网卡功能,交换机)进行比较。 With good setup you should get more than 80-90 percents of declared speed. 如果设置良好,您应该获得超过80-90%的声明速度。 iperf
or netperf
. 使用iperf
或netperf
速度计算数据量和使用网络传输如此多数据所需的时间。 Compare with actual transfer time, is there huge difference? 与实际转移时间相比,是否存在巨大差异?
top
, vmstat
, iostat
. 看看top
, vmstat
, iostat
。
top
and press 1
to see cores)? 是否有100%加载的CPU核心(运行top
并按1
查看核心)? in
) in vmstat 1
? vmstat 1
是否有太多中断( in
)? What about context switches ( cs
)? 上下文切换( cs
)怎么样? iostat 1
? iostat 1
文件读取速度是多少? Are your HDDs are fast enough to read data; 你的硬盘驱动器是否足够快以读取数据; to write data on receiver? 在接收器上写数据? perf top
or perf record -a
. 您可以尝试使用perf top
或perf record -a
进行全系统分析。 Is there lot of computing by scp, or network stack in Linux? Linux中的scp或网络堆栈有很多计算机吗? If you can install dtrace
or ktap
, try to make also off-cpu profiling 如果您可以安装dtrace
或ktap
,请尝试进行off-cpu分析 You have 1.5 GB * 400 = 600 GB of data. 您有1.5 GB * 400 = 600 GB的数据。 Unrelated to the answer I suggest that the machine set up looks incorrect if you need to transfer this amount of data. 与答案无关我建议如果您需要传输此数据量,机器设置看起来不正确。 You probably needed to generate this data at machine A in the first place. 您可能首先需要在机器A上生成此数据。
There are 600 GB of data being transferred in 2 hours, that is ~ 85 MB/s transfer rate, which means you probably reached the transfer limits of either your disk drives or (almost) the network. 在2小时内传输600 GB数据,即〜85 MB / s传输速率,这意味着您可能已达到磁盘驱动器或(几乎)网络的传输限制。 I believe you won't be able to transfer faster with any other command. 我相信你将无法使用任何其他命令更快地转移。
If the machines are close to each other, the method of copying that I believe is the fastest is to physically remove the storage from machines B and C, put them in machine A and then locally copy them without transferring via the network. 如果机器彼此靠近,我认为最快的复制方法是从机器B和C中物理移除存储,将它们放入机器A,然后在本地复制它们而不通过网络传输。 The time for this is the time to move around the storage, plus disk transfer times. 这个时间是移动存储的时间,加上磁盘传输时间。 I'm afraid, however, the copy won't be much faster than 85 MB/s. 但是,我担心副本的速度不会比85 MB / s快。
The network transfer command that I believe would be the fastest one is netcat, because it has no overhead related to encryption. 我认为最快的网络传输命令是netcat,因为它没有与加密相关的开销。 Additionally, if the files are not media files, you have to compress them using a compressor that compresses faster than 85 MB/s. 此外,如果文件不是媒体文件,则必须使用压缩比压缩速度超过85 MB / s的压缩器进行压缩。 I know of lzop and lz4 that are granted to be faster than this rate. 我知道lzop和lz4被授予比这个速度更快的速度。 So my command line for transfering a single directory would be (BSD netcat syntax): 所以我转移单个目录的命令行是(BSD netcat语法):
machine A: 机器A:
$ nc -l 2000 | lzop -d | tar x
machine B or C (can be executed from machine A with the help of ssh): 机器B或C(可以在ssh的帮助下从机器A执行):
$ tar c directory | lzop | nc machineA 2000
Remove the compressor if transfering media files, which are already compressed. 如果传输已压缩的媒体文件,请删除压缩器。
The commands to organize your directory structure are irrelevant in terms of speed, so I didn't bother to write them here, but you can reuse your own code. 组织目录结构的命令在速度方面是无关紧要的,所以我不打算在这里写它们,但你可以重用自己的代码。
This is the fastest method I can think of, but, again, I don't believe this command will be much faster that what you already have. 这是我能想到的最快的方法,但是,我不相信这个命令会比你已经拥有的更快。
You definitely want to give rclone a try. 你肯定想试试rclone 。 This thing is crazy fast : 这件事很疯狂:
sudo rclone sync /usr /home/fred/temp -P -L --transfers 64 sudo rclone sync / usr / home / fred / temp -P -L --transfers 64
Transferred: 17.929G / 17.929 GBytes, 100%, 165.692 MBytes/s, ETA 0s Errors: 75 (retrying may help) Checks: 691078 / 691078, 100% Transferred: 345539 / 345539, 100% Elapsed time: 1m50.8s 转让:17.929G / 17.929 GBytes,100%,165.692 MBytes / s,ETA 0s错误:75(重试可能有帮助)检查:691078 / 691078,100%转移:345539 / 345539,100%经过时间:1m50.8s
This is a local copy from and to a LITEONIT LCS-256 (256GB) SSD. 这是LITEONIT LCS-256(256GB)SSD的本地副本。
rsync optionally compresses its data. rsync可选择压缩其数据。 That typically makes the transfer go much faster. 这通常会使转移速度更快。
You didn't mention SCP, but SCP -C also compresses. 你没有提到SCP,但SCP -C也压缩了。
Do note that compression might make the transfer go faster or slower, depending upon the speed of your CPU and of your network link. 请注意,压缩可能会使传输速度变慢或变慢,具体取决于CPU和网络链接的速度。
Slower links and faster CPU make compression a good idea; 较慢的链接和更快的CPU使压缩成为一个好主意; faster links and slower CPU make compression a bad idea. 更快的链接和更慢的CPU使压缩成为一个坏主意。
As with any optimization, measure the results in your own environment. 与任何优化一样,在您自己的环境中测量结果。
Also I think ftp is another option for you, as my transfer speed test for large files (>10M) FTP work faster then SCP and even rsync (It's depended on file format and compression rate). 另外我认为ftp是另一种选择,因为我对大文件(> 10M)FTP的传输速度测试比SCP甚至rsync工作得更快(它取决于文件格式和压缩率)。
rsync
is a good answer, but if you care about security then you should consider using: rsync
是一个很好的答案,但如果您关心安全性,那么您应该考虑使用:
rdist
Some details on the differences between rsync and rdist can be found here: rdist vs rsync and a blog about how to set it up using ssh can be found here: non root remote updating 有关rsync和rdist之间差异的一些细节可以在这里找到: rdist vs rsync和一个关于如何使用ssh进行设置的博客可以在这里找到: 非root远程更新
Finally you could use the infamous tar pipe tar pattern, with a sprinkle of ssh. 最后你可以使用臭名昭着的tar管道tar模式,撒上ssh。
tar zcvf - /wwwdata | ssh root@dumpserver.nixcraft.in "cat > /backup/wwwdata.tar.gz"
This example is talked about here: tar copy over secure network 这里讨论这个例子: 通过安全网络进行tar拷贝
The remote doesn't support ssh multiplexing. 遥控器不支持ssh多路复用。
To silence the message: 要使消息保持沉默:
mux_client_request_session: session request failed: Session open refused by peer
Change your ~/.ssh/config
file: 更改~/.ssh/config
文件:
Host destination.hostname.com
ControlMaster no
Host *
ControlMaster auto
ControlPersist yes
ControlPath ~/.ssh/socket-%r@%h:%p
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.