简体   繁体   中英

How to find all files in a Directory with grep and regex?

I have a Directory(Linux/Unix) on a Apache Server with a lot of subdirectory containing lot of files like this:

- Dir  
  - 2010_01/
    - 142_78596_101_322.pdf
    - 12_10.pdf
    - ...
  - 2010_02/   
    - ...

How can i find all files with filesnames looking like: *_*_*_*.pdf ? where * is always a digit!!

I try to solve it like this:

ls -1Rl 2010-01 | grep -i '\(\d)+[_](\d)+[_](\d)+[_](\d)+[.](pdf)$' | wc -l

But the regular expression \\(\\d)+[_](\\d)+[_](\\d)+[_](\\d)+[.](pdf)$ doesn't work with grep.

Edit 1 : Trying ls -l 2010-03 | grep -E '(\\d+_){3}\\d+\\.pdf' | wc -l ls -l 2010-03 | grep -E '(\\d+_){3}\\d+\\.pdf' | wc -l ls -l 2010-03 | grep -E '(\\d+_){3}\\d+\\.pdf' | wc -l for example just return null. So it's dont work perfectly

Try using find .

The command that satisfies your specification __*_*.pdf where * is always a digit :

find 2010_10/ -regex '__\d+_\d+\.pdf'

You seem to be wanting a sequence of 4 numbers separated by underscores, however, based on the regex that you tried.

(\d+_){3}\d+\.pdf

Or do you want to match all names containing solely numbers/underscores?

[\d_]+\.pdf

First, you should be using egrep vs grep or call grep with -E for extended patterns.

So this works for me:

$ cat test2.txt
- Dir  
  - 2010_01/
    - 142_78596_101_322.pdf
    - 12_10.pdf
    - ...
  - 2010_02/   
    - ...

Now egrep that file:

cat test2.txt | egrep '((?:\d+_){3}(?:\d+)\.pdf$)'
- 142_78596_101_322.pdf

Since there are parenthesis around the whole pattern, the entire file name will be captured.

Note that the pattern does NOT work with grep in traditional mode:

$ cat test2.txt | grep '((?:\d+_){3}(?:\d+)\.pdf$)'
... no return

But DOES work if you use the extend pattern switch (the same as calling egrep):

$ cat test2.txt | grep -E '((?:\d+_){3}(?:\d+)\.pdf$)'
- 142_78596_101_322.pdf 

Thanks to gbchaosmaster and the wolf I find a way which work for me:

Into a Directory :

find . | grep -P "(\d+_){3}\d+\.pdf" | wc -l

At the Root Directory :

find 20*/ | grep -P "(\d+_){3}\d+\.pdf" | wc -l

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM