简体   繁体   English

bash:在额外条件下循环文件

[英]bash: looping over the files with extra conditions

In the working directory there are several files grouped into several groups based on the end-suffix of the file name. 在工作目录中,有几个文件根据文件名的后缀分为几组。 Here is the example for 4 groups: 这是4组的示例:

# group 1 has 5 files
NpXynWT_apo_300K_1.pdb
NpXynWT_apo_300K_2.pdb
NpXynWT_apo_300K_3.pdb
NpXynWT_apo_300K_4.pdb
NpXynWT_apo_300K_5.pdb
# group 2 has two files
NpXynWT_apo_340K_1.pdb
NpXynWT_apo_340K_2.pdb
# group 3 has 4 files
NpXynWT_com_300K_1.pdb
NpXynWT_com_300K_2.pdb
NpXynWT_com_300K_3.pdb
NpXynWT_com_300K_4.pdb
# group 4 has 1 file
NpXynWT_com_340K_1.pdb

I have wrote a simple bash workflow to 我写了一个简单的bash工作流程来

  1. List item pre-process each of the fille via SED: add something within each of file 列表项通过SED预处理每个圆角:在每个文件中添加一些内容
  2. cat together the pre-processed files that belongs to the same group cat的预处理文件一起属于同一组

Here is my script for the realisation of the workflow where I created an array with the names of the groups and looped it according to file index from 1 to 5 这是我用于实现工作流程的脚本,在该脚本中,我创建了具有组名的数组,并根据文件索引(从1到5)对其进行循环

# list of 4 groups
systems=(NpXynWT_apo_300K NpXynWT_apo_340K NpXynWT_com_300K NpXynWT_com_340K)

 # loop over the groups
for model in "${systems[@]}"; do  
    # loop over the files inside of each group
    for i in {0001..0005}; do
    # edit file via SED
    sed -i "1 i\This is $i file of the group" "${pdbs}"/"${model}"_"$i"_FA.pdb
    done
# after editing cat the pre-processed filles
  cat "${pdbs}"/"${model}"_[1-5]_FA.pdb > "${output}/${model}.pdb"
done

The questions to improve this script: 1) how it would be possible to add within the inner (while) loop some checking conditions (eg by means of IF statement) to consider only existing files ? 改进此脚本的问题:1)如何在内部(while)循环中添加一些检查条件(例如,通过IF语句)以考虑现有文件 In my example the script always loops 5 files (for each group) according to the maximum number in one of the group (here 5 files in the first group) 在我的示例中,脚本始终根据一个组中的最大数量(每个组中的5个文件)循环播放5个文件(每个组)

for i in {0001..0005}; do

I would rather to loop along all of the existing files of the given group and break the while loop in the case if the file does not exist (eg considering the 4th group with only 1 file). 我宁愿遍历给定组的所有现有文件,并在文件不存在的情况下中断while循环(例如,考虑只有1个文件的第4组)。 Here is the example, which however does not work properly 这是示例,但是无法正常工作

 # loop over the groups with the checking of the presence of the file
for model in "${systems[@]}"; do  
    i="0"
    # loop over the files inside of each group
    for i in {0001..9999}; do
    if [ ! -f "${pdbs}/${model}_00${i}_FA.pdb" ]; then
echo 'File '${pdbs}/${model}_00${i}_FA.pdb' does not exits!'
    break
    else
    # edit file via SED
    sed -i "1 i\This is $i file of the group" "${pdbs}"/"${model}"_00"$i"_FA.pdb
    i=$[$i+1]
    fi
    done
done

Would it be possible to loop over any number of existing filles from the group (rather than just restricting to given eg very big number of files by 是否有可能循环访问组中的任意数量的现有填充文件(而不是仅仅限制给定的大量文件,例如

for i in {0001..9999}; do?
  1. You can check if a file exists with the -f test, and break if it doesn't: 您可以使用-f测试检查文件是否存在,如果不存在,则将其break

     if [ ! -f "${pdbs}/${model}_${i}_FA.pdb" ]; then break fi 
  2. You existing cat command already does only count the existing files in each group, because "${pdbs}"/"${model}"_[1-5]_FA.pdb bash is performing filename expansion here, not simply expanding the [1-5] to all possible values. 您现有的cat命令已经只计算每个组中的现有文件,因为"${pdbs}"/"${model}"_[1-5]_FA.pdb bash在此处执行文件名扩展,而不仅仅是扩展[1-5]为所有可能的值。 You can see this in the following example: 您可以在以下示例中看到这一点:

     > touch f1 f2 f5 # files f3 and f4 do not exist > echo f[1-5] f1 f2 f5 

    Notice that f[1-5] did not expand to f1 f2 f3 f4 f5 . 请注意, f[1-5]并未扩展为f1 f2 f3 f4 f5

Update : 更新

If you want your glob expression to match files ending in numbers bigger than 9, the [1-n] syntax will not work. 如果要让全局表达式匹配以大于9的数字结尾的文件,则[1-n]语法将不起作用。 The reason is that the [...] syntax defines a pattern that matches a single character. 原因在于[...]语法定义了与单个字符匹配的模式。 For instance, the expression foo[1-9] will match files foo1 through foo9 , but not foo10 or foo99 . 例如,表达式foo[1-9]将匹配文件foo1foo9 ,而不foo10foo99

Doing something like foo[1-99] does not work, because it doesn't mean what you might think it means. 不能执行foo[1-99]操作,因为这并不意味着您可能会认为意味着什么。 The inside of the [] can contain any number of individual characters, or ranges of characters. []的内部可以包含任意数量的单个字符或字符范围。 For example, [1-9a-nxyz] would match any character from '1' through '9' , from 'a' through 'n' , or any of the characters 'x' , 'y' , or 'z' , but it would not match '0' , 'q' , 'r' , etc. Or for that matter, it would also not match any uppercase letters. 例如, [1-9a-nxyz]将匹配从'1''9'任何字符,从'a''n'任何字符,或任何字符'x''y''z' ,但它匹配'0''q''r'等。或者,也不能匹配任何大写字母。

So [1-99] is not interpreted as the range of numbers from 1-99, it is interpreted as the set of characters comprised of the range from '1' to '9', plus the individual character '9'. 所以[1-99]不被解释为数字的范围从1-99,它被解释为一组包含在范围从“1”到“9”的字符 ,再加上个性“9”。 Therefore the patterns [1-9] and [1-99] are equivalent, and will only match characters '1' through '9' . 因此,模式[1-9][1-99]是等效的,并且只会匹配字符'1''9' The second 9 in the latter expression is redundant. 后一个表达式中的第二个9是多余的。

However, you can still achieve what you want with extended globs, which you can enable with the command shopt -s extglob : 但是,您仍然可以通过扩展glob实现所需的功能,可以通过使用shopt -s extglob命令来启用shopt -s extglob

> touch f1 f2 f5 f99 f100000 f129828523
> echo f[1-99999999999]       # Doesn't work like you want it to
f1 f2 f5
> shopt -s extglob
> echo f+([0-9])
f1 f2 f5 f99 f100000 f129828523

The +([0-9]) expression is an extended glob expression composed of two parts: the [0-9] , whose meaning should be obvious at this point, and the enclosing +(...) . +([0-9])表达式是扩展的glob表达式,它由两部分组成: [0-9] (在这时其含义应该很明显)和封闭的+(...)

The +(pattern) syntax is an extglob expression that means match one or more instances of pattern . +(pattern)语法是extglob表达式,表示匹配pattern一个或多个实例。 In this case, our pattern is [0-9] , so the extglob expression +([0-9]) matches any string of digits 0-9. 在这种情况下,我们的模式为[0-9] ,因此extglob表达式+([0-9])匹配任何数字0-9的字符串。

However, you should note that this means it also matches things like 000000000 . 但是,您应该注意,这意味着它也匹配000000000 If you are only interested in numbers greater than or equal to 1, you would instead do (with extglob enabled): 如果您只对大于或等于1的数字感兴趣,则可以这样做(启用extglob ):

> echo f[1-9]*([0-9])

Note the *(pattern) here instead of +(pattern) . 注意这里的*(pattern)而不是+(pattern) The * means match zero or more instances of pattern. *表示匹配零个或多个模式实例。 Which we want because we've already matched the first digit with [1-9] . 我们想要的,因为我们已经将第一个数字与[1-9]匹配。 For instance, f[1-9]+([0-9]) does not match the filename f1 . 例如, f[1-9]+([0-9])与文件名f1不匹配。

You may not want to leave extglob enabled in your whole script, particularly if you have any regular glob expression elsewhere in your script that might accidentally be interpreted as an extglob expression. 您可能不希望在整个脚本中extglob启用extglob ,尤其是如果您在脚本中的其他位置有任何正则glob表达式,而这些表达式可能会意外地解释为extglob表达式。 To disable extglob when you're done with it, do: 要在完成后禁用extglob ,请执行以下操作:

shopt -u extglob

There's one other important thing to note here. 这里还有另一件事要注意。 If a glob pattern doesn't match any files, then it is interpreted as a raw string, and is left unmodified. 如果全局模式与任何文件都不匹配,那么它将被解释为原始字符串,并且保持不变。

For example: 例如:

> echo This_file_totally_does_not_exist*
This_file_totally_does_not_exist*

Or more to the point in your case, suppose there are zero files in your 4th case, eg there are no files containing NpXynWT_com_340K . 或更NpXynWT_com_340K ,假设第4种情况下文件为零,例如,没有文件包含NpXynWT_com_340K In this case, if you try to use a glob containing NpXynWT_com_340K , you get the entire glob as a literal string: 在这种情况下,如果尝试使用包含NpXynWT_com_340K的glob,则会将整个glob作为文字字符串获取:

> shopt -s extglob
> echo NpXynWT_com_340K_[1-9]*([0-9])
echo NpXynWT_com_340K_[1-9]*([0-9])

This is obviously not what you want, especially in the middle of your script where you are trying to cat the matching files. 这显然不是你想要的,尤其是在你的脚本,你想中间cat匹配的文件。 Luckily there is another option you can set to make non-matching globs expand to nothing: 幸运的是,您可以设置另一个选项,以使不匹配的glob扩展为空:

> shopt -s nullglob
> echo This_file_totally_does_not_exist*   # prints nothing

As with extglob , there may be unintended behavior elsewhere in your script if you leave nullglob on. extglob ,如果将nullglob保留为extglob ,则脚本中的其他地方可能会有意外行为。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM