简体   繁体   English

使用RSync复制连续范围的文件

[英]Using RSync to copy a sequential range of files

Sorry if this makes no sense, but I will try to give all the information needed! 对不起,如果这没有意义,但我会尝试提供所需的所有信息!

I would like to use rsync to copy a range of sequentially numbered files from one folder to another. 我想使用rsync将一系列顺序编号的文件从一个文件夹复制到另一个文件夹。

I am archiving a DCDM (Its a film thing) and it contains in the order of 600,000 individually numbered, sequential .tif image files (~10mb ea.). 我正在归档DCDM(它是一个电影的东西),它包含600,000个单独编号的顺序.tif图像文件(~10mb ea。)。

I need to break this up to properly archive onto LTO6 tapes. 我需要打破这一点才能正确存档到LTO6磁带上。 And I would like to use rsync to prep the folders such that my simple bash .sh file can automate the various folders and files that I want to back up to tape. 我想使用rsync来准备文件夹,这样我的简单bash .sh文件就可以自动化我要备份到磁带的各种文件夹和文件。

The command I normally use when running rsync is: 我在运行rsync时通常使用的命令是:

sudo rsync -rvhW --progress --size only <src> <dest>

I use sudo if needed, and I always test the outcome first with --dry-run 如果需要我使用sudo ,我总是先用--dry-run测试结果

The only way I've got anything to work (without kicking out errors) is by using the * wildcard. 我有任何工作(没有踢出错误)的唯一方法是使用*通配符。 However, this only does files with the set pattern (eg. 01* will only move files from the range 010000 - 019999 ) and I would have to repeat for 02 , 03 , 04 etc.. 然而,这不仅会与设定模式文件(例如, 01*只会从范围内移动文件010000 - 019999 ),我不得不重复了020304等。

I've looked on the internet, and am struggling to find an answer that works. 我已经在网上看了,我正在努力找到一个有效的答案。

This might not be possible, and with 600,000 .tif files, I can't write an exclude for each one! 这可能是不可能的,并且对于600,000个.tif文件,我不能为每个文件写一个排除!

Any thoughts as to how (if at all) this could be done? 有关如何(如果有的话)这样做的任何想法?

Owen. 欧文。

You can check for the file name starting with a digit by using pattern matching : 您可以使用模式匹配检查以数字开头的文件名:

for file in [0-9]*; do
    # do something to $file name that starts with digit
done

Or, you could enable the extglob option and loop over all file names that contain only digits. 或者,您可以启用extglob选项并循环遍历仅包含数字的所有文件名。 This could eliminate any potential unwanted files that start with a digit but contain non-digits after the first character. 这可以消除任何以数字开头但在第一个字符后包含非数字的潜在不需要的文件。

shopt -s extglob
for file in +([0-9]); do
    # do something to $file name that contains only digits
done
  • +([0-9]) expands to one or more occurrence of a digit +([0-9])扩展为一个或多个数字

Update: 更新:

Based on the file name pattern in your recent comment: 根据您最近评论中的文件名模式:

shopt -s extglob
for file in legendary_dcdm_3d+([0-9]).tif; do
    # do something to $file
done

Globing is the feature of the shell to expand a wildcard to a list of matching file names. Globing是shell的一项功能,可将通配符扩展为匹配文件名列表。 You have already used it in your question. 您已在问题中使用过它。

For the following explanations, I will assume we are in a directory with the following files: 对于以下解释,我假设我们在一个包含以下文件的目录中:

$ ls -l
-rw-r----- 1 5gon12eder staff 0 Sep  8 17:26 file.txt
-rw-r----- 1 5gon12eder staff 0 Sep  8 17:26 funny_cat.jpg
-rw-r----- 1 5gon12eder staff 0 Sep  8 17:26 report_2013-1.pdf
-rw-r----- 1 5gon12eder staff 0 Sep  8 17:26 report_2013-2.pdf
-rw-r----- 1 5gon12eder staff 0 Sep  8 17:26 report_2013-3.pdf
-rw-r----- 1 5gon12eder staff 0 Sep  8 17:26 report_2013-4.pdf
-rw-r----- 1 5gon12eder staff 0 Sep  8 17:26 report_2014-1.pdf
-rw-r----- 1 5gon12eder staff 0 Sep  8 17:26 report_2014-2.pdf

The most simple case is to match all files. 最简单的情况是匹配所有文件。 The following makes for a poor man's ls . 以下是一个穷人的ls

$ echo *
file.txt funny_cat.jpg report_2013-1.pdf report_2013-2.pdf report_2013-3.pdf report_2013-4.pdf report_2014-1.pdf report_2014-2.pdf

If we want to match all reports from 2013, we can narrow the match: 如果我们想要匹配2013年的所有报告,我们可以缩小匹配范围:

$ echo report_2013-*.pdf
report_2013-1.pdf report_2013-2.pdf report_2013-3.pdf report_2013-4.pdf

We could, for example, have left out the .pdf part but I like to be as specific as possible. 例如,我们可以省略.pdf部分,但我希望尽可能具体。

You have already come up with a solution to use this for selecting a range of numbered files. 您已经提出了一个解决方案来使用它来选择一系列编号的文件。 For example, we can match reports by quater: 例如,我们可以按季度匹配报告:

$ for q in 1 2 3 4; do echo "$q. quater: " report_*-$q.pdf; done
1. quater:  report_2013-1.pdf report_2014-1.pdf
2. quater:  report_2013-2.pdf report_2014-2.pdf
3. quater:  report_2013-3.pdf
4. quater:  report_2013-4.pdf

If we are to lazy to type 1 2 3 4 , we could have used $(seq 4) instead. 如果我们懒得输入1 2 3 4 ,我们可以使用$(seq 4)代替。 This invokes the program seq with argument 4 and substitutes its output ( 1 2 3 4 in this case). 这将使用参数4调用程序seq并替换其输出(在这种情况下为1 2 3 4 )。

Now back to your problem: If you want chunk sizes that are a power of 10, you should be able to extend the above example to fit your needs. 现在回到你的问题:如果你想要大小为10的大小,你应该能够扩展上面的例子以满足你的需求。

old question i know, but someone may find this useful. 我知道这个老问题,但有人可能觉得这很有用。 the above examples for expanding a range also work with rsync . 上面扩展范围的示例也适用于rsync for example to copy files starting with a, b and c but not d and e from dir /tmp/from_here to dir /tmp/to_here : 例如,将以a,b和c开头但不是d和e的文件从dir /tmp/from_here到dir /tmp/to_here

$ rsync -avv /tmp/from_here/[a-c]* /tmp/to_here
sending incremental file list
delta-transmission disabled for local transfer or --whole-file
alice/
bob/
cedric/
total: matches=0  hash_hits=0  false_alarms=0 data=0

sent 89 bytes  received 24 bytes  226.00 bytes/sec
total size is 0  speedup is 0.00

If you are writing to LTO6 tapes, you should consider including "--inplace" to your command. 如果您要写入LTO6磁带,则应考虑在命令中包含“--inplace”。 Inplace is meant for writing to linear filesystems such as LTO Inplace用于写入线性文件系统,如LTO

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM