简体   繁体   English

AWK,SED,REGEX重命名文件

[英]AWK, SED, REGEX to rename files

I'm only learning to use REGEX, AWK and SED. 我只是在学习使用REGEX,AWK和SED。 I currently have a group of files that I'd like to rename - they all sit in one directory. 我目前有一组要重命名的文件-它们都位于一个目录中。

The naming pattern is consistent, but I would like to re-arrange the filenames, here is the format: 命名模式是一致的,但是我想重新排列文件名,这里是格式:

01._HORRIBLE_HISTORIES_S2.mp4
02._HORRIBLE_HISTORIES_S2.mp4

I'd like to rename them to HORRIBLE_HISTORIES_s01e01.mp4 - where the e01 is gleaned from the first column. 我想将它们重命名为HORRIBLE_HISTORIES_s01e01.mp4-从第一列中收集e01。 I know that I want to grab "01" from the first column, stuff it in a variable then paste it after the S2 in each filename, at the same time I want to remove it from the beginning of the filename along with the "._", additionally I want to change the "S2" to "s02". 我知道我想从第一列中获取“ 01”,将其填充到变量中,然后将其粘贴到每个文件名的S2之后,与此同时,我想从文件名的开头将其与“一起删除”。 _”,此外,我想将“ S2”更改为“ s02”。

If anyone would be so kind, could you help me write something using awk/sed and explain the procedure, that I might learn from it? 如果有人那么善良,您能帮我用awk / sed编写一些东西并解释该过程,以便我可以从中学习吗?

for f in *.mp4; do 
  echo mv "$f" \
    "$(awk -F '[._]' '{ si = sprintf("%02s", substr($5,2)); 
                          print $3 "_" $4 "_s" si "e" $1 "." $6 }' <<<"$f")"
done 
  • Loops over all *.mp4 files. 循环遍历所有*.mp4文件。
  • Renames each to the result of the awk command, provided via command substitution ( $(...) ). 将每个重命名为awk命令的结果,该结果通过命令替换( $(...) )提供。
  • The awk command splits the input filename into tokens by . awk命令通过将输入文件名拆分为令牌. or "_" (which makes the first token available as $1 , the second as $2 , ...). 或“ _”(这使第一个令牌可用$1 ,第二个令牌可用$2 ,...)。
  • First, the number in "_S{number}" is left-padded to 2 digits with a 0 (ie, a 0 is only prepended if the number doesn't already have 2 digits) and stored in variable si (season index); 首先,将“ _S {number}”中的数字用0左填充到2位数字(即,仅当数字还没有2位数字时才添加0 )并存储在变量si (季节索引)中; if it's OK to always prepend 0 , the awk "program" can be simplified to: { print $3 "_" $4 "_s0" substr($5,2) "e" $1 "." $6 } 如果可以始终始终添加0 ,则可以将awk“程序”简化为: { print $3 "_" $4 "_s0" substr($5,2) "e" $1 "." $6 } { print $3 "_" $4 "_s0" substr($5,2) "e" $1 "." $6 }
  • The result, along with the remaining tokens, is then rearranged to form the desired filename. 然后将结果与其余标记重新排列以形成所需的文件名。

Note the echo before mv to allow you to safely preview the resulting command - remove it to perform actual renaming. mv之前记下echo ,以使您可以安全地预览生成的命令-删除它以执行实际的重命名。

Alternative : a pure bash solution using a regular expression: 替代方案 :使用正则表达式的纯bash解决方案:

for f in *.mp4; do 
  [[ $f =~ ^([0-9]+)\._([^.]+)_S([^.]+)\.(.+)$ ]]
  echo mv "$f" \
"${BASH_REMATCH[2]}_s0${BASH_REMATCH[3]}e${BASH_REMATCH[1]}.${BASH_REMATCH[4]}"
done 
  • Uses bash's regular-expression matching operator, =~ , with capture groups (the substrings in (...) ) to match against each filename and extract substrings of interest. 使用bash的正则表达式匹配运算符=~和捕获组( (...)的子字符串)与每个文件名匹配并提取感兴趣的子字符串。
  • The matching results are stored in the special array variable $BASH_REMATCH , with element 0 containing the entire match, 1 containing what matches the first capture group, 2 the second, and so on. 匹配结果被存储在特殊数组变量$BASH_REMATCH ,与元件0包含整个比赛, 1含有什么第一捕获组,火柴2第二,等等。
  • The mv command's target argument then assembles the capture-group matches in the desired order; 然后, mv命令的目标参数按所需顺序组合捕获组匹配项。 note that in this case, for simplicity, I've made the zero-padding of s{number} unconditional - a 0 is simply prepended. 请注意,在这种情况下,为简单起见,我将s{number}的零填充设为无条件-只是在前面加上了0

As above, you need to remove echo before mv to perform actual renaming. 如上所述,您需要在mv之前删除echo以执行实际的重命名。

A common way of renaming multiple files according to a pattern, is to use the Perl command rename . 根据模式重命名多个文件的常用方法是使用Perl命令rename It uses Perl regular expressions and is very powerful. 它使用Perl正则表达式,功能非常强大。 Use -n -v to test the pattern without touching the files: 使用-n -v来测试模式而不接触文件:

$ rename -n -v 's/^(\d+)._(.+)_S2\.mp4/$2_s02e$1.mp4/' *.mp4
01._HORRIBLE_HISTORIES_S2.mp4 renamed as HORRIBLE_HISTORIES_s02e01.mp4
02._HORRIBLE_HISTORIES_S2.mp4 renamed as HORRIBLE_HISTORIES_s02e02.mp4

Use parentheses to capture strings into variables $1 (first capture), $2 (second capture) etc: 使用括号将字符串捕获到变量$1 (第一次捕获), $2 (第二次捕获)等中:

  • ^(\\d+) capture numbers at beginning of filename (into $1) ^(\\d+)在文件名的开头捕获数字(到$1)
  • ._(.+)_S2\\.mp4 capture everything between ._ and _S2.mp4 (into $2 ) ._(.+)_S2\\.mp4捕获.__S2.mp4之间的所有内容(成$2
  • $2_s02e$1.mp4 assemble your new filename with the captured data as you want it $2_s02e$1.mp4将新文件名与捕获的数据组合起来

When you are happy with the result, remove -n from the command and it will rename all the files for real. 对结果满意后,从命令中删除-n ,它将重命名所有文件。

rename is often available by default on Linux (package util-linux ). rename往往是默认选项在Linux(包util-linux )。 There is a similar discussion here on SO with more details about finding/installing the right command. 在SO上也有类似的讨论 ,其中包含有关查找/安装正确命令的更多详细信息。

You can do it with almost pure bash (with variable expansion ): 您可以使用几乎纯bash (具有可变的扩展 )来做到这一点:

for f in *mp4 ; do
  newfilename="${f:5:20}_s01e${f:1:2}.mp4"
  echo mv $f $newfilename
done

If the output from this command suits your needs, you may remove the echo from the cycle, or more simply (if your last command was the above) issue: !! | bash 如果此命令的输出适合您的需要,则可以从循环中删除echo ,或者更简单地(如果您的最后一个命令是上述命令)发出以下问题: !! | bash !! | bash

Make the filename string into a textfile then use loop and awk to rename file. 使文件名字符串成为文本文件,然后使用loop和awk重命名文件。

while read oldname; do
  newname=$(awk -F'.' '{ print substr($2, 2) "_e" $1 "." $3 }' <<< ${oldname} | \
        awk -F'_' '{ print $1 "_s0" substr($2, 2) $3 }');
  mv ${oldname} ${newname};
done<input.txt

If you're willing to use gawk , the regex matching really comes in handy. 如果您愿意使用gawk ,则正则表达式匹配确实非常有用。 I find this pipe-based solution a little nicer than worrying about looping constructs. 我发现这种基于管道的解决方案比担心循环构造要好得多。

ls -1 | \
    gawk 'match($0, /.../, a) { printf ... | "sh" } \
    END { close("sh") }'

For ease of reading I've replaced the regex and the mv command with ellipses. 为了便于阅读,我用省略号替换了regex和mv命令。

  • Line 1 lists all the file names in the current directory, one line each and pipes that to the gawk command. 第1行列出了当前目录中的所有文件名,每行一行,并将其通过管道传送到gawk命令。
  • Line 2 runs the regex match, assigning captured groups to the array variable a . 第2行运行regex匹配,将捕获的组分配给数组变量a The action converts this into our desired command with printf which is itself piped to sh to execute. 该动作使用printf其转换为我们所需的命令,该命令本身通过管道传递给sh以执行。
  • Line 3 closes the shell that was implicitly opened when we started piping things to it. 第3行关闭了当我们开始向其管道传递内容时隐式打开的外壳。

So then you just fill in your regex and command syntax (borrowing from mklement0 ). 因此,您只需填写正则表达式和命令语法(从mklement0借用)。 For example ( LIVE CODE WARNING ): 例如( LIVE CODE WARNING ):

ls -1 | \
    gawk 'match($0, /^([0-9]+)\._([^.]+)_S([^.]+)\.(.+)$/, a) { printf "mv %s %s_s0%se%s.%s\n",a[0],a[2],a[3],a[1],a[4] | "sh" } \
    END { close("sh") }'

To preview that command (as you should) you can simply remove the | "sh" 要预览该命令(如您所愿),您只需删除| "sh" | "sh" from the second line. 第二行的| "sh"

using AWK. 使用AWK。 rename file with first and second and 4th part 用第一部分,第二部分和第四部分重命名文件

ls | while read file; do newfile=`echo $file | awk -F . '{print $1 "." $2 "." $4}'`; echo $newfile;  mv $file $newfile; done;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM