简体   繁体   English

Grep 的正则表达式以捕获 Zip 存档中最后一个文件的名称

[英]REGEX for Grep to capture Name of last file in Zip Archive

Well it has finally happened.好在它终于发生了。 My Google-fu has failed me.我的 Google-fu 让我失望了。 Please help...请帮忙...

I have a batch file that goes through a directory and gets information from Comic archives (.cbz files)我有一个批处理文件,它通过一个目录并从漫画档案(.cbz 文件)中获取信息

It generates a CSV file with the Titles, # of Pages, Resolution of last page, Size of archive, and name of the artist它会生成一个 CSV 文件,其中包含标题、页数、最后一页的分辨率、存档大小和艺术家姓名

This all works fine except for the resolution.除了分辨率,这一切都很好。 I am able to get the resolution no problem but extracting the last page only works if files are named a specific way in the archive (Files are named Page 000 to whatever and i count the number of files and substract 1).我可以毫无问题地获得分辨率,但只有在档案中以特定方式命名文件时才能提取最后一页(文件被命名为第 000 页,我计算文件数并减去 1)。 If it deviates (first page is Page 801 and last is Page 868) it fails to extract the page because i am telling it to extract Page 068 instead of 868.如果它偏离(第一页是第 801 页,最后一页是第 868 页)它无法提取页面,因为我告诉它提取第 068 页而不是 868。

So i figured if i just get the actual name of the last page, i am golden.所以我想如果我只是得到最后一页的实际名称,我就是金。

I am trying to grep the last filename in a zip file by using:我正在尝试使用 grep zip 文件中的最后一个文件名:

7z l filename | grep -o -P Page\s[0-9]{3}\..*(?!Page\s[0-9]{3}\..*)

But that gives me all the filenames.但这给了我所有的文件名。

Here is the output i am trying to grep:这是我正在尝试 grep 的 output:

7-Zip [64] 9.38 beta  Copyright (c) 1999-2014 Igor Pavlov  2015-01-03

Listing archive: Christian Knockers {Pages 0801-0868} [Dark Lord].cbz

--
Path = Christian Knockers {Pages 0801-0868} [Dark Lord].cbz
Type = zip
Physical Size = 224551692

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2020-11-19 15:51:25 ....A      3589432      3589432  Page 801.png
2020-11-19 16:09:29 ....A      3455981      3455981  Page 802.png
2020-11-26 14:48:47 ....A      3017353      3017353  Page 803.png
2020-11-26 15:02:27 ....A      3627637      3627637  Page 804.png
2020-11-26 15:13:05 ....A      3212321      3212321  Page 805.png
<snip>
2021-03-19 15:37:49 ....A      3106721      3106721  Page 864.png
2021-03-19 15:37:19 ....A      2619460      2619460  Page 865.png
2021-03-19 15:37:21 ....A      3063014      3063014  Page 866.png
2021-03-19 15:36:38 ....A      2423233      2423233  Page 867.png
2021-03-19 15:36:41 ....A      2908774      2908774  Page 868.png
------------------- ----- ------------ ------------  ------------------------
2021-03-19 15:38:54          224542422    224542422  68 files

Kernel  Time =     0.015 =   18%
User    Time =     0.000 =    0%
Process Time =     0.015 =   18%    Virtual  Memory =      3 MB
Global  Time =     0.084 =  100%    Physical Memory =      7 MB

I am getting better and better at regex but only groups i have used are capturing groups.我在正则表达式方面越来越好,但只有我使用过的组是捕获组。 What i googled keeps saying negative lookahead but i am not having any luck.我用谷歌搜索的内容一直在说消极的前瞻,但我没有任何运气。

Any help is appreciated!任何帮助表示赞赏!

Use利用

grep -zoP '(?s)Page\s[0-9]{3}\.\w+(?!.*Page\s[0-9]{3}\.\w+)' file

-z will treat the file as a single line. -z将文件视为单行。 (?s) will allow the dot to match line boundaries. (?s)将允许点匹配行边界。

EXPLANATION解释

--------------------------------------------------------------------------------
  Page                     'Page'
--------------------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
  [0-9]{3}                 any character of: '0' to '9' (3 times)
--------------------------------------------------------------------------------
  \.                       '.'
--------------------------------------------------------------------------------
  \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    .*                       any character (0 or more times matching the most amount possible)
--------------------------------------------------------------------------------
    Page                     'Page'
--------------------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
    [0-9]{3}                 any character of: '0' to '9' (3 times)
--------------------------------------------------------------------------------
    \.                       '.'
--------------------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of look-ahead

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM