简体   繁体   English

Bash 脚本,打印包含字符串的文件名

[英]Bash script, print filenames that contain a string

I have a folder with a couple of files that I need to organize/manipulate depending on if they both exist, or only one of them exists.我有一个包含几个文件的文件夹,我需要根据它们是否都存在或仅存在一个来组织/操作这些文件。

In my folder called folder1/checkthese/*.bam the files are:在我名为folder1/checkthese/*.bam的文件夹中,文件是:

file1_aln.bam
file1_aln_sorted.bam

I have a script that checks if I have the unsorted file (which is just *_aln.bam ) and sorted file ( *_aln_sorted.bam ) but I am having trouble getting my script to run correctly depending on if they both exist or not.我有一个脚本来检查我是否有未排序的文件(只是*_aln.bam )和已排序的文件( *_aln_sorted.bam ),但我无法让我的脚本正确运行,具体取决于它们是否都存在。

Here is my mini script:这是我的迷你脚本:

for files in folder1/checkthese/*.bam 
do
    if [[ ${files} =~ "_aln.bam" ]] && [[ ${files} =~ "_aln_sorted.bam" ]]
    then                                                                                                                                                                                                         
          echo "both files exist, need to delete unsorted file only"
          echo "REMOVE $(basename ${files/_aln*}_aln.bam)"
          rm -f ${files/_aln*}_aln.bam            
    elif [[ ${files} =~ "_aln_sorted.bam" ]] && [[ ! ${files} =~ "_aln.bam" ]]                                                        
    then                                                                   
          echo "Only sorted file exists, all good"                             
    fi 
done

But this is the output I get:但这是我得到的 output:

Only sorted file exists, all good.

But clearly the unsorted file exists so for some reason it is skipping the first part of my loop and not removing the _aln.bam file.但显然未排序的文件存在,因此由于某种原因它跳过了我循环的第一部分,而不是删除_aln.bam文件。 I am not sure how to change my conditional statement in my elif statement so that if ONLY the _aln_sorted.bam file exists, then all is good and I don't need to delete anything.我不确定如何在我的 elif 语句中更改条件语句,以便如果仅_aln_sorted.bam文件存在,那么一切都很好,我不需要删除任何内容。 I think I should not be using the && for my elif statement, but I thought the !我认为我不应该将&&用于我的elif语句,但我认为! essentially is the NOT boolean for this.本质上是NOT boolean。

Dude, your comparision can't do what you want.伙计,你的比较不能做你想做的事。

Your first comparision is checking for the files that name contains both _aln.bam and _aln_sorted.bam string.您的第一个比较是检查名称包含 _aln.bam 和 _aln_sorted.bam 字符串的文件。 And the second is checking for the files that name contains _aln_sorted.bam and doesn't contain _aln.bam!第二个是检查名称包含 _aln_sorted.bam 且不包含 _aln.bam 的文件!

So these comparions works on same file in every execution!所以这些比较在每次执行时都在同一个文件上工作!

You need this:你需要这个:

#!/bin/bash

for file in /full_path/folder1/checkthese/*.bam 
do
    if [[ ${file} =~ "_aln.bam" ]]
    then                                                                                                                                                                                                         
          echo "Unsorted file was found! It will be removed!"
          echo "Removing the file named ${file}"
          rm -f ${file}
      echo "File removed!"
    elif [[ ${file} =~ "_aln_sorted.bam" ]]                                                        
    then                                                                   
     echo "${file} is a sorted file!"
    fi 
done

-----------EDIT-------------------- - - - - - -编辑 - - - - - - - - - -

Okay I fixed my original script which did not use booleans to check for strings in the filename but instead checked if files existed.好的,我修复了我的原始脚本,该脚本不使用布尔值来检查文件名中的字符串,而是检查文件是否存在。 This worked for me:这对我有用:

Originally I had this script as well but ran into similar problems:最初我也有这个脚本,但遇到了类似的问题:

for files in folder1/checkthese/*.bam 
do
    if [ -f ${files/_aln*}_aln.bam ] && [ -f ${files/_aln*}_aln_sorted.bam ]
    then                                                                                                                                                                                                         
          echo "both files exist, need to delete unsorted file only"
          echo "REMOVE $(basename ${files/_aln*}_aln.bam)"
          rm -f ${files/_aln*}_aln.bam            
    elif [ -f ${files/_aln*}_aln_sorted.bam ] && [ ! -f ${files/_aln*}_aln_sorted.bam ]                                                    
    then                                                                   
          echo "Only sorted file exists, all good"                             
    fi 
done

Output works now. Output 现在可以工作了。

I will present a little less conventional solution, stressing two points:我将提出一个不太传统的解决方案,强调两点:

  • prefer working with file lists like with other textual data喜欢处理文件列表,就像处理其他文本数据一样
  • separate the logic and destructive operations (you can check what you're about to delete)将逻辑和破坏性操作分开(您可以检查要删除的内容)

First create some test files首先创建一些测试文件

mkdir data
seq 1 5 | xargs -I{} touch 'data/file_{}_aln.bam'

# first three of them have their sorted equivalents
seq 1 3 | xargs -I{} touch 'data/file_{}_aln_sorted.bam'

First let's check what files I'd delete:首先让我们检查一下我要删除哪些文件:

find data -name '*.bam' | sort | sed 's/_sorted//' | uniq -d

The complement are the files I have to sort yet:补充是我必须排序的文件:

find data -name '*.bam' | sort | sed 's/_sorted//' | uniq -u

After checking, I can do something like this to delete the files检查后,我可以做这样的事情来删除文件

find data -name '*.bam' | sort | sed 's/_sorted//' | uniq -d | xargs rm

The final check if all unsorted are gone can be done easily by最后检查是否所有未排序的都消失了,可以通过以下方式轻松完成

ls data/*_aln.bam 

# or to get some numeric results:
ls data/*_aln.bam | wc -l

Of course the usual caveats apply - use sensible file names or you have to use find -print0 | xargs -0当然,通常的警告适用 - 使用合理的文件名,或者您必须使用find -print0 | xargs -0 find -print0 | xargs -0 and deal with the consequences. find -print0 | xargs -0并处理后果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM