简体   繁体   English

grep两个模式独立(在不同的行)

[英]grep for two patterns independently (in different lines)

I have some directories with the following structure: 我有一些具有以下结构的目录:

DAY1/ # Files under this directory should have DAY1 in the name.
|-- Date
|   |-- dir1 # Something wrong here, there are files with DAY2 and files with DAY1.
|   |-- dir2
|   |-- dir3
|   |-- dir4
DAY2/ # Files under this directory should all have DAY2 in the name.
|-- Date
|   |-- dir1
|   |-- dir2 # Something wrong here, there are files with DAY2, and files with DAY1.
|   |-- dir3
|   |-- dir4

In each dir there are hundreds of thousands of files with names containing DAY , for example 0.0000.DAY1.01927492 . 在每个dir中有数十万个名称包含DAY的文件,例如0.0000.DAY1.01927492 Files with DAY1 on the name should only appear under parent directory DAY1 . 名称上带有DAY1文件应仅出现在父目录DAY1

Something went wrong when copying files around, so that I now have mixed files with DAY1 and DAY2 in some of the dir directories. 复制文件时出错了,所以我现在在一些dir目录中有DAY1DAY2混合文件。

I wrote a script to find folders that contain mixed files, so I can then look at them more closely. 我写了一个脚本来查找包含混合文件的文件夹,因此我可以更仔细地查看它们。 My script is the following: 我的脚本如下:

for directory in */; do
    if ls $directory | grep -q DAY2 ; then
        if ls $directory | grep -q DAY1; then 
              echo "mixed files in $directory";
        fi ; 
    fi; 
done

The problem here is that I'm going through all files twice, which doesn't make sense considering that I'd only have to look through the files once. 这里的问题是我要经历两次所有文件,考虑到我只需要查看一次文件就没有意义了。

What would be a more efficient way achieve what I want? 什么是更有效的方式实现我想要的?

If i understand you correctly, then you need to find the files under DAY1 directory recursively that have DAY2 in their names, similarly for DAY2 directory the files what have DAY1 in their names. 如果我理解正确,那么你需要递归地找到DAY1目录下的文件,它们的名字中有DAY2 ,类似于DAY2目录的文件名称中有DAY1

If so, for DAY1 directory: 如果是这样,对于DAY1目录:

find DAY1/ -type f -name '*DAY2*'

this will get you the files under DAY1 directory that have DAY2 in their names. 这将获得DAY1目录下名称中包含DAY2的文件。 Similarly for DAY2 directory: 同样适用于DAY2目录:

find DAY2/ -type f -name '*DAY1*'

Both are recursive operations. 两者都是递归操作。


To get the directory names only: 仅获取目录名称:

find DAY1/ -type f -name '*DAY2*' -exec dirname {} +

Note that the $PWD will be shown as . 请注意, $PWD将显示为. .

To get uniqueness, pass the output to sort -u : 要获得唯一性,请将输出传递给sort -u

find DAY1/ -type f -name '*DAY2*' -exec dirname {} + | sort -u

Given that the difference between going through them once and going through them twice is just a factor-of-two difference, changing to an approach that goes through them only once might actually not be a win, since the new approach might easily take twice as long per file. 鉴于通过它们一次并经历两次之间的差异只是两个因素之间的差异,改为只通过它们一次的方法可能实际上不是一个胜利,因为新方法可能很容易花费两倍每个文件长。

So you'll definitely want to experiment; 所以你肯定想要试验; it's not necessarily something that you can confidently reason about. 它不一定是你可以自信地推理的东西。

However, I will say that in addition to going through the files twice, the ls version also sorts the files, which probably has a more-than-linear cost (unless it's doing some kind of bucket-sort). 但是,我会说,除了两次浏览文件之外, ls版本还会对文件进行排序 ,这可能具有超过线性的成本(除非它正在进行某种桶式排序)。 Eliminating that, by writing ls --sort=none instead of just ls , will actually improve your algorithmic complexity, and is almost certain to give a tangible improvement. 消除,通过编写ls --sort=none ,而不是仅仅ls ,实际上会提高你的算法复杂度,而且几乎肯定将得到明显改善。


But FWIW, here's a version that only goes through the files once, that you can try: 但是FWIW,这是一个只能通过文件一次的版本,你可以尝试:

for directory in */; do
  find "$directory" -maxdepth 1 \( -name '*DAY1*' -or -name '*DAY2*' \) -print0 \
  | { saw_day1=
      saw_day2=
      while IFS= read -d '' subdirectory ; do
        if [[ "$subdirectory" == *DAY1* ]] ; then
          saw_day1=1
        fi
        if [[ "$subdirectory" == *DAY2* ]] ; then
          saw_day2=1
        fi
        if [[ "$saw_day1" ]] && [[ "$saw_day2" ]] ; then
          echo "mixed files in $directory"
          break
        fi
      done
    }
done

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM