简体   繁体   中英

How can I use perl to delete files matching a regex

Due to a Makefile mistake, I have some fake files in my git repo...

$ ls
=0.1.1                  =4.8.0                  LICENSE
=0.5.3                  =5.2.0                  Makefile
=0.6.1                  =7.1.0                  pyproject.toml
=0.6.1,                 all_commands.txt        README_git_workflow.md
=0.8.1                  CHANGES.md              README.md
=1.2.0                  ciscoconfparse/         requirements.txt
=1.7.0                  configs/                sphinx-doc/
=2.0                    CONTRIBUTING.md         tests/
=2.2.0                  deploy_docs.py          tutorial/
=22.2.0                 dev_tools/              utils/
=22.8.0                 do.py
=2.7.0                  examples/
$

I tried this, but it seems that there may be some more efficient means to accomplish this task ...

# glob "*" will list all files globbed against "*"
foreach my $filename (grep { /\W\d+\.\d+/ } glob "*") {
    my $cmd1 = "rm $filename";
    `$cmd1`;
}

Question:

  • I want a remove command that matches against a pcre.
  • What is a more efficient perl solution to delete the files matching this perl regex: /\W\d+\.\d+/ (example filename: '=0.1.1') ?

Fetch a wider set of files and then filter through whatever you want

my @files_to_del = grep { /^\W[0-9]+\.[0-9]+/ and not -d } glob "$dir/*"; 

I added an anchor ( ^ ) so that the regex can only match a string that begins with that pattern, otherwise this can blow away files other than intended. Reconsider what exactly you need.

Altogether perhaps (or see a one-liner below )

use warnings;
use strict;
use feature 'say';

use File::Glob ':bsd_glob';  # for better glob()
use Cwd qw(cwd);             # current-working-directory

my $dir = shift // cwd;      # cwd by default, or from input 

my $re = qr/^\W[0-9]+\.[0-9]+/;  

my @files_to_del = grep { /$re/ and not -d } glob "$dir/*"; 

say for @files_to_del;  # please inspect first

#unlink or warn "Can't unlink $_: $!" for @files_to_del;

where that * in glob might as well have some pre-selection, if suitable. In particular, if the = is a literal character (and not an indicator printed by the shell, see footnote) then glob "=*" will fetch files starting with it, and then you can pass those through a grep filter.

I exclude directories, identified by -d filetest , since we are looking for files (and to not mix with some scary language about directories from unlink , thanks to brian d foy comment).

If you'd need to scan subdirectories and do the same with them, perhaps recursively -- what doesn't seem to be the case here? -- then we could employ this logic in File::Find::find (or File::Find::Rule , or yet others).

Or read the directory any other way ( opendir + readdir , libraries like Path::Tiny ), and filter.


Or, a quick one-liner... print (to inspect) what's about to get blown away

perl -wE'say for grep { /^\W[0-9]+\.[0-9]+/ and not -d } glob "*"'

and then delete 'em

perl -wE'unlink or warn "$_: $!" for grep /^\W[0-9]+\.[0-9]+/ && !-d, glob "*"'

(I switched to a more compact syntax just so. Not necessary)

If you'd like to be able to pass a directory to it (optionally, or work in the current one) then do

perl -wE'$d = shift//q(.); ...'  dirpath (relative path fine. optional)

and then use glob "$d/*" in the code. This works the same way as in the script above -- shift pulls the first element from @ARGV , if anything was passed to the script on the command line, or if @ARGV is empty it returns undef and then // ( defined-or ) operator picks up the string q(.) .


That leading = may be an "indicator" of a file type if ls has been aliased with ls -F , what can be checked by running ls with suppressed aliases, one way being \ls (or check alias ls ).

If that is so, the = stands for it being a socket, what in Perl can be tested for by the -S filetest.

Then that \W in the proposed regex may need to be changed to \W? to allow for no non-word characters preceding a digit, along with a test for a socket. Like

my $re = qr/^\W? [0-9]+ \. [0-9]+/x;

my @files_to_del = grep { /$re/ and -S } glob "$dir/*"; 

Why not just:

$ rm =*

Sometimes, shell commands are the best option.

In these cases, I use perl to merely filter the list of files:

ls | perl -ne 'print if /\A\W\d+\.\d+/a' | xargs rm

And, when I do that, I feel guilty for not doing something simpler with an extended pattern in grep :

ls | grep -E '^\W\d+\.\d+' | xargs rm

Eventually I'll run into a problem where there's a directory so I need to be more careful about the file list:

find . -type f  -maxdepth 1 | grep -E '^\./\W\d+\.\d+' | xargs rm

Or I need to allow rm to remove directories too should I want that:

ls | grep -E '^\W\d+\.\d+' | xargs rm -r

Here you go.

unlink( grep { /\W\d+\.\d+/ && !-d } glob( "*" ) );

This matches the filename, and excludes directories.

To delete filenames matching this: /\W\d+\.\d+/ pcre , use the following one-liners...

1> $fn is a filename... I'm also removing the my keywords since the one-liner doesn't have to worry about perl lexical scopes :

perl -e 'foreach $fn (grep { /\W\d+\.\d+/ } glob "*") {$cmd1="rm $fn";`$cmd1`;}'

2> Or as Andy Lester responded , perhaps his answer is as efficient as we can make it...

perl -e 'unlink(grep { /\W\d+\.\d+/ } glob "*");'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM