简体   繁体   中英

multiline regex to catch size zero filename in bad formatted file list

i have captured file-list txt file (from powershell on win10).
i only have result text file, can't access to origin storage files. so, i have to deal with only this text file now.
i can use vs-code editor, or other gnu-tools like "awk" or "sed".

got to create the "null file list" from it (list of size zero files).
the file looks like this...

Directory: D:\etc

Mode                 LastWriteTime         Length Name                                                                 
----                 -------------         ------ ----                                                                 
d-----        2017-03-27  오전 11:41                Start_Here_Mac.app                                                   
-a----        2017-02-07   오후 5:00             0 Autorun.inf                                                          
-a----        2017-02-07   오후 5:00       17949304 Start_Here_Win.exe                                                   

Directory: D:\etc\Start_Here_Mac.app

Mode                 LastWriteTime         Length Name                                                                 
----                 -------------         ------ ----                                                                 
d-----        2017-03-27  오전 11:41                Contents                                                             


Directory: D:\etc\Start_Here_Mac.app\Contents

Mode                 LastWriteTime         Length Name                                                                 
----                 -------------         ------ ----                                                                 
d-----        2017-02-07   오후 5:00                Frameworks                                                           
d-----        2017-03-27  오전 11:41                _CodeSignature                                                       
-a----        2017-02-07   오후 5:00            854 Info.plist                                                           
-a----        2017-02-07   오후 5:00              0 PkgInfo                                                              

from this, to below...

D:\etc : Autorun.inf
D:\etc\Start_Here_Mac.app\Contents : PkgInfo

ah...
the size of origin text file would be... almost million (by line numbers).
so i have to do this job with regex.
i was quite good at dealing with regex 10 years ago, but i found myself that i only had dealt with single-line regex only.

i couldn't make right regular expression to deal with this multi-line situation.
Please help me.

ps and i also got jammed on vs-code's regexp functionality. it looks wierd when dealing with multiple-line regexp. is there useful tips to deal with it on vscode.exe?

the best i could got... was "2 step replacement" like...
to find

^Directory:\s(.+)$|^.+\s+0\s+(.+)$

to replace

$1_-_$2\n

then, all the directory path (whether size zero or not) will be printed out,
and size-zero file would also printed out between the lines.
so,
every directory (folder) name will end with "_-_$", while size-zero filename will begin with "^_-_".
then i can loop next regex to extract "^_-_.*"s combined with above path to complete the job.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM