I have a folder structure encompassing many thousands of folders. I would like to be able to find all the folders that, for example, contain multiple .txt files, or multiple .jpeg, or whatever without seeing any folders that contain only a single file of that kind.
The folders should all have only one file of a specific type, but this is not always the case and it is tedious to try to find them.
Note that the folders may contain many other files.
If possible, I'd like to match "FILE.JPG" and "file.jpg" as both matching a query on "file" or "jpg".
What I have been doing in simply find . -iname "*file*"
find . -iname "*file*"
and going through it manually.
folders contain folders, sometimes 3 or 4 levels deep
first/
second/
README.txt
readme.TXT
readme.txt
foo.txt
third/
info.txt
third/fourth/
raksljdfa.txt
Should return
first/second/README.txt
first/second/readme.TXT
first/second/readme.txt
first/secondfoo.txt```
when searching for "txt"
and
first/second/README.txt
first/second/readme.TXT
first/second/readme.txt
when searching for "readme"
Something like this sounds like what you want:
find . -type f -print0 |
awk -v re='[.]txt$' '
BEGIN {
RS = "\0"
IGNORECASE = 1
}
{
dir = gensub("/[^/]+$","",1,$0)
file = gensub("^.*/","",1,$0)
}
file ~ re {
dir2files[dir][file]
}
END {
for (dir in dir2files) {
if ( length(dir2files[dir]) > 1 ) {
for (file in dir2files[dir]) {
print dir "/" file
}
}
}
}'
It's untested but should be close. It uses GNU awk for gensub(), IGNORECASE, true multi-dimensional arrays and length(array).
This pure Bash code should do it (with caveats, see below):
#! /bin/bash
fileglob=$1 # E.g. '*.txt' or '*readme*'
shopt -s nullglob # Expand to nothing if nothing matches
shopt -s dotglob # Match files whose names start with '.'
shopt -s globstar # '**' matches multiple directory levels
shopt -s nocaseglob # Ignore case when matching
IFS= # Disable word splitting
for dir in **/ ; do
matching_files=( "$dir"$fileglob )
(( ${#matching_files[*]} > 1 )) && printf '%s\n' "${matching_files[@]}"
done
Supply the pattern to be matched as an argument to the program when you run it. Eg
myprog '*.txt'
myprog '*readme*'
(The quotes on the patterns are necessary to stop them matching files in the current directory.)
The caveats regarding the code are:
globstar
was introduced with Bash 4.0. The code won't work with older Bash. globstar
matches followed symlinks. This could lead to duplicate outputs, or even failures due to circular links. **/
pattern expands to a list of all the directories in the hierarchy. This could take an excessively long time or use an excessive amount of memory if the number of directories is large (say, greater than ten thousand). If your Bash is older than 4.3, or you have large numbers of directories, this code is a better option:
#! /bin/bash
fileglob=$1 # E.g. '*.txt' or '*readme*'
shopt -s nullglob # Expand to nothing if nothing matches
shopt -s dotglob # Match files whose names start with '.'
shopt -s nocaseglob # Ignore case when matching
IFS= # Disable word splitting
find . -type d -print0 \
| while read -r -d '' dir ; do
matching_files=( "$dir"/$fileglob )
(( ${#matching_files[*]} > 1 )) \
&& printf '%s\n' "${matching_files[@]}"
done
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.