简体   繁体   中英

Remove characters and numbers from a string in perl

I'm trying to rename a bunch of files in my directory and I'm stuck at the regex part of it.

I want to remove certain characters from a filename which appear at the beginning.

Example1: _00-author--book_revision_

Expected: Author - Book (Revision)

So far, I am able to use regex to remove underscores & captialize the first letter

$newfile =~ s/_/ /g;
$newfile =~ s/^[0-9]//g;
$newfile =~ s/^[0-9]//g;
$newfile =~ s/^-//g;
$newfile = ucfirst($newfile);

This is not a good method. I need help in removing all characters until you hit the first letter, and when you hit the first '-' I want to add a space before and after '-'. Also when I hit the second '-' I want to replace it with '('.

Any guidance, tips or even suggestions on taking the right approach is much appreciated.

Your instructions and your example don't match.

According to your instructions,

s/^[^\pL]+//;    # Remove everything until first letter.
s/-/ - /;        # Replace first "-" with " - "
s/-[^-]*\K-/(/;  # Replace second "-" with "("

According to your example,

s/^[^\pL]+//;
s/--/ - /;
s/_/ (/;
s/_/)/;
s/(?<!\pL)(\pL)/\U$1/g;
$filename =~ s,^_\d+-(.*?)--(.*?)_(.*?)_$,\u\1 - \u\2 (\u\3),;

My Perl interpreter (using strict and warnings) says that this is better written as:

$filename =~ s,^_\d+-(.*?)--(.*?)_(.*?)_$,\u$1 - \u$2 (\u$3),;

The first one probably is more sedish for its taste! (Of course both version works just the same.)

Explanation (as requested by stema ):

$filename =~ s/
  ^       # matches the start of the line
  _\d+-   # matches an underscore, one or more digits and a hypen minus
  (.*?)-- # matches (non-greedyly) anything before two consecutive hypen-minus
          #   and captures the entire match (as the first capture group)
  (.*?)_  # matches (non-greedyly) anything before a single underscore and
          #  captures the entire match (as the second capture group)
  (.*?)_  # does the same as the one before (but captures the match as the
          #  third capture group obviously)
  $       # matches the end of the line
/\u$1 - \u$2 (\u$3)/x;

The \\u${1..3} in replacement specification simply tells Perl to insert the capture groups from 1 to 3 with their first character made upper-case. If you'd wanted to make the entire match (in a captured group) upper-case you'd had to use \\U instead.

The x flags turns on verbose mode, which tells the Perl interpreter that we want to use # comments, so it will ignore these (and any white space in the regular expression - so if you want to match a space you have to use either \\s or \\ ). Unfortunately I couldn't figure out how to tell Perl to ignore white space in the * replacement* specification - this is why I've written that on a single line.

(Also note that I've changed my s terminator from , to / - Perl barked at me if I used the , with verbose mode turned on ... not exactly sure why.)

So do you want to capitalize all the components of the new filename, or just the first one? Your question is inconsistent on that point.

Note that if you are on Linux, you probably have the rename command, which will take a perl expression and use it to rename files for you, something like this:

rename 'my ($a,$b,$r);$_ = "$a - $b ($r)" 
  if ($a, $b, $r) = map { ucfirst $_ } /^_\d+-(.*?)--(.*?)_(.*?)_$/' _*

If they all follow that format then try:

my ($author, $book, $revision) = $newfiles =~ /-(.*?)--(.*?)_(.*?)_/;

print ucfirst($author ) . " - $book ($revision)\n";

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM