简体   繁体   中英

Bash - Regex to determine if output of ls -al is file or directory and hidden

I am trying to find if each line of output from running ls -al is a file or directory and whether or not it is hidden and count the type of each.

EDIT: It is imperative that I must not use find .

#!/bin/bash
#declare four different regex statements that match files, hidden files, directories and hidden directories (excluding . and ..)
#based on the output of each line of running ls -al
re_file='^\-[rwx\-]{9}\s[0-9]+\s([a-z_][a-z0-9_]{0,30})\s([a-z_][a-z0-9_]{0,30})\s[0-9]+\s\w{3}\s[0-9]+\s[0-9]{2}:[0-9]{2}\s[^\.](\w|\.)*$'
re_hidden_file='^\-[rwx\-]{9}\s[0-9]+\s([a-z_][a-z0-9_]{0,30})\s([a-z_][a-z0-9_]{0,30})\s[0-9]+\s\w{3}\s[0-9]+\s[0-9]{2}:[0-9]{2}\s\.\w(\w|\.)*$'
re_directory='^d[rwx\-]{9}\s[0-9]+\s([a-z_][a-z0-9_]{0,30})\s([a-z_][a-z0-9_]{0,30})\s[0-9]+\s\w{3}\s[0-9]+\s[0-9]{2}:[0-9]{2}\s[^\.](\w|\.)*$'
re_hidden_directory='^d[rwx\-]{9}\s[0-9]+\s([a-z_][a-z0-9_]{0,30})\s([a-z_][a-z0-9_]{0,30})\s[0-9]+\s\w{3}\s[0-9]+\s[0-9]{2}:[0-9]{2}\s\.\w(\w|\.)*$'
#declare four different counters for each type
file_count=0
hidden_file_count=0
directory_count=0
hidden_directory_count=0
#read through the output of ls -al line by line, assigning x the value of each line
ls -al $1 | while read x; do
  #test if each line matches each of the regex statements, if it does then increment the relevant counter
  if [[ $x =~ $re_file ]] ; then
    file_count+=1
  elif [[ $x =~ $re_hidden_file ]] ; then
    hidden_file_count+=1
  elif [[ $x =~ $re_directory ]] ; then
    directory_count+=1
  elif [[ $x =~ $re_hidden_directory ]] ; then
    hidden_directory_count+=1
  else
    echo "!!!"
  fi
done
total=$((file_count + hidden_file_count + directory_count + hidden_directory_count))
echo "Files found: $file_count (plus $hidden_file_count hidden)"
echo "Directories found: $directory_count (plus $hidden_directory_count hidden)"
echo "Total files and directories: $total"

Currently the script outputs the !!! from not matching any of the Regex statements for each line of ls -al and all of the counter variables remain at 0 . Here's an example of the input (though Bash removes the extra spaces used for padding before the Regex checks are done).

drwx--x--x  37 username groupname  4096 Jan  8 14:37 .
drwxr-xr-x 235 root     root       4096 Nov 15 12:16 ..
drwx------   3 username groupname  4096 Oct 27 14:35 .adobe
-rw-------   1 username groupname 14458 Dec  5 20:24 .bash_history
-rw-------   1 username groupname  2680 Sep 30 16:12 .bash_profile
-rw-------   1 username groupname  1210 Oct  7 09:40 .bashrc
drwx------  12 username groupname  4096 Dec  6 15:24 .cache
drwxr-xr-x  17 username groupname  4096 Jan  8 14:37 .config
drwx------   4 username groupname  4096 Dec  5 17:51 dir1
drwx------   2 username groupname  4096 Nov 23 12:26 dir2
...

I have tested the Regex on an online Regex checker and they evaluate as I would like them to. I assume this is a Bash-specific problem. Any help is appreciated.

You should not parse ls to get files. Use find instead with nul termination or globbing.

The problem is that ls produces ambiguous output for file names that are otherwise legal file names. Consider:

$ touch a$'\t'b
$ touch a$'\n'b
$ ls -l a*
-rw-r--r--  1 andrew  wheel  0 Jan  8 08:25 a?b
-rw-r--r--  1 andrew  wheel  0 Jan  8 08:26 a?b

The unprintable characters of \\t and \\n are replaced with ? and render those files from ls ambiguous.

The same will happen with trailing spaces:

$ touch "a b c   "
$ touch "a b c       "
$ ls -al a\ b*
-rw-r--r--  1 andrew  wheel  0 Jan  8 08:44 a b c   
-rw-r--r--  1 andrew  wheel  0 Jan  8 08:44 a b c   

Now consider using find :

$ find . -name "a*" -maxdepth 1 -print0 | xargs -0 printf   "'%s'\n"
'./a    b'
'./a
b'
'./a b c   '
'./a b c      '

Or just globbing:

$ for fn in a*; do printf "'%s'\n" "$fn"; done
'a  b'
'a
b'
'a b c   '
'a b c      '

If you want to get total directories and total files including hidden files and directories just add that to your glob pattern:

file_count=0
hidden_file_count=0
regular_directory_count=0
hidden_directory_count=0

echo "=====regular files and directories:"
for fn in *; do 
    printf "'%s'\n" "$fn" 
    if [ -d "$fn" ]; then
        regular_directory_count=$((regular_directory_count+1))
    else
        file_count=$((file_count+1))
    fi      
done
echo "====hidden files and direcotries:"
for fn in .*; do 
    printf "'%s'\n" "$fn"; 
    if [ -d "$fn" ]; then
        hidden_directory_count=$((hidden_directory_count+1))
    else
        hidden_file_count=$((hidden_file_count+1))
    fi          
done

printf "Regular files: %s regular directories: %s\n" $file_count $regular_directory_count
printf "Hidden files:  %s hidden directories:  %s\n" $hidden_file_count $hidden_directory_count
tf=$((hidden_file_count+file_count))
td=$((hidden_directory_count+regular_directory_count))
printf "Total files:   %s total directories:   %s\n"  $tf $td

Given:

$ ls -la
total 0
drwxr-xr-x   9 andrew  wheel   306 Jan  8 11:07 .
drwxrwxrwt  92 root    wheel  3128 Jan  8 10:58 ..
drwxr-xr-x   2 andrew  wheel    68 Jan  8 11:07 .hidden dir
-rw-r--r--   1 andrew  wheel     0 Jan  8 11:26 .hidden file
-rw-r--r--   1 andrew  wheel     0 Jan  8 11:26 a?b
-rw-r--r--   1 andrew  wheel     0 Jan  8 11:26 a?b
-rw-r--r--   1 andrew  wheel     0 Jan  8 11:26 a b c   
-rw-r--r--   1 andrew  wheel     0 Jan  8 11:26 a b c       
drwxr-xr-x   2 andrew  wheel    68 Jan  8 11:07 regular dir

Run that and you get:

=====regular files and directories:
'a  b'
'a
b'
'a b c   '
'a b c       '
'regular dir'
====hidden files and direcotries:
'.'
'..'
'.hidden dir'
'.hidden file'
Regular files: 4 regular directories: 1
Hidden files:  1 hidden directories:  3
Total files:   5 total directories:   4

If you want to exclude . and .. hidden directories you can set GLOBIGNORE=".:.." prior to using the .* glob pattern.

Took me a while but got it to work.

My approach: avoid parsing the output of ls -l . Specially here you don't need it. Enable options so * in for loop sees hidden objects and test each object against object type (using shopt ).

Also: a+=1 doesn't do what you think it does. It just appends 1 at the end of the string!

#!/bin/bash
#declare four different regex statements that match files, hidden files, directories and hidden directories (excluding . and ..)
#based on the output of each line of running ls -al
re_hidden_file='^\..*'
#declare four different counters for each type
file_count=0
hidden_file_count=0
directory_count=0
hidden_directory_count=0

# enable hidden files/directories
shopt -s dotglob
#read through the output of ls -al line by line, assigning x the value of each line
for x in * ; do
  #test if each line matches each of the regex statements, if it does then increment the relevant counter
  if [ -d "$x" ] ; then
  if [[ "$x" =~ $re_hidden_file ]] ; then
    hidden_directory_count=$((hidden_directory_count+1))
  else
    directory_count=$((directory_count+1))
  fi
  else

  if [[ "$x" =~ $re_hidden_file ]] ; then
    hidden_file_count=$((hidden_file_count+1))
  else
    file_count=$((file_count+1))
   fi
   fi
done


total=$((file_count + hidden_file_count + directory_count + hidden_directory_count))
echo "Files found: $file_count (plus $hidden_file_count hidden)"
echo "Directories found: $directory_count (plus $hidden_directory_count hidden)"
echo "Total files and directories: $total"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM