简体   繁体   English

从Perl中的字符串中删除字符和数字

[英]Remove characters and numbers from a string in perl

I'm trying to rename a bunch of files in my directory and I'm stuck at the regex part of it. 我正在尝试重命名目录中的一堆文件,并且卡在了它的正则表达式部分。

I want to remove certain characters from a filename which appear at the beginning. 我想从文件名中删除出现在开头的某些字符。

Example1: _00-author--book_revision_ _00-author--book_revision_ 1: _00-author--book_revision_

Expected: Author - Book (Revision) 预期: Author - Book (Revision)

So far, I am able to use regex to remove underscores & captialize the first letter 到目前为止,我已经可以使用正则表达式删除下划线并首字母大写

$newfile =~ s/_/ /g;
$newfile =~ s/^[0-9]//g;
$newfile =~ s/^[0-9]//g;
$newfile =~ s/^-//g;
$newfile = ucfirst($newfile);

This is not a good method. 这不是一个好方法。 I need help in removing all characters until you hit the first letter, and when you hit the first '-' I want to add a space before and after '-'. 在删除所有字符之前,我需要帮助,直到您击中第一个字母,并且当您击中第一个“-”时,我想在“-”之前和之后添加一个空格。 Also when I hit the second '-' I want to replace it with '('. 另外,当我按下第二个'-'时,我想将其替换为'('。

Any guidance, tips or even suggestions on taking the right approach is much appreciated. 非常感谢您采取正确方法的任何指导,技巧甚至建议。

Your instructions and your example don't match. 您的说明和示例不匹配。

According to your instructions, 根据您的指示,

s/^[^\pL]+//;    # Remove everything until first letter.
s/-/ - /;        # Replace first "-" with " - "
s/-[^-]*\K-/(/;  # Replace second "-" with "("

According to your example, 根据您的示例,

s/^[^\pL]+//;
s/--/ - /;
s/_/ (/;
s/_/)/;
s/(?<!\pL)(\pL)/\U$1/g;
$filename =~ s,^_\d+-(.*?)--(.*?)_(.*?)_$,\u\1 - \u\2 (\u\3),;

My Perl interpreter (using strict and warnings) says that this is better written as: 我的Perl解释器(使用严格和警告)说,最好这样写:

$filename =~ s,^_\d+-(.*?)--(.*?)_(.*?)_$,\u$1 - \u$2 (\u$3),;

The first one probably is more sedish for its taste! 第一个可能更喜欢它的味道! (Of course both version works just the same.) (当然,两个版本的工作原理相同。)

Explanation (as requested by stema ): 说明(按stema的要求):

$filename =~ s/
  ^       # matches the start of the line
  _\d+-   # matches an underscore, one or more digits and a hypen minus
  (.*?)-- # matches (non-greedyly) anything before two consecutive hypen-minus
          #   and captures the entire match (as the first capture group)
  (.*?)_  # matches (non-greedyly) anything before a single underscore and
          #  captures the entire match (as the second capture group)
  (.*?)_  # does the same as the one before (but captures the match as the
          #  third capture group obviously)
  $       # matches the end of the line
/\u$1 - \u$2 (\u$3)/x;

The \\u${1..3} in replacement specification simply tells Perl to insert the capture groups from 1 to 3 with their first character made upper-case. 替换规范中的\\u${1..3}仅告诉Perl将捕获组从1到3插入,它们的第一个字符大写。 If you'd wanted to make the entire match (in a captured group) upper-case you'd had to use \\U instead. 如果要使整个匹配(在捕获的组中)大写,则必须改用\\U

The x flags turns on verbose mode, which tells the Perl interpreter that we want to use # comments, so it will ignore these (and any white space in the regular expression - so if you want to match a space you have to use either \\s or \\ ). x标志打开了详细模式,该模式告诉Perl解释器我们要使用注释,因此它将忽略这些注释(以及正则表达式中的任何空格-因此,如果要匹配空格 ,则必须使用\\s\\ )。 Unfortunately I couldn't figure out how to tell Perl to ignore white space in the * replacement* specification - this is why I've written that on a single line. 不幸的是,我无法弄清楚如何让Perl忽略*替换*规范中的空白-这就是为什么我在一行上编写了空白。

(Also note that I've changed my s terminator from , to / - Perl barked at me if I used the , with verbose mode turned on ... not exactly sure why.) (另请注意,我已经改变了我的s终止从,/ - Perl的咆哮在我,如果我用了,用详细模式开启...不知道是什么原因。)

So do you want to capitalize all the components of the new filename, or just the first one? 那么,您是要大写新文件名的所有组成部分还是仅将第一个大写? Your question is inconsistent on that point. 您的问题在这一点上是不一致的。

Note that if you are on Linux, you probably have the rename command, which will take a perl expression and use it to rename files for you, something like this: 请注意,如果您使用的是Linux,则可能有rename命令,该命令将使用perl表达式并使用它为您重命名文件,如下所示:

rename 'my ($a,$b,$r);$_ = "$a - $b ($r)" 
  if ($a, $b, $r) = map { ucfirst $_ } /^_\d+-(.*?)--(.*?)_(.*?)_$/' _*

If they all follow that format then try: 如果它们都遵循该格式,请尝试:

my ($author, $book, $revision) = $newfiles =~ /-(.*?)--(.*?)_(.*?)_/;

print ucfirst($author ) . " - $book ($revision)\n";

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM