简体   繁体   中英

How to get a list of files that do not match a regular expression pattern?

I need help creating a Windows batch script for listing files that do not match this regular expression in a given directory:

^[0-9]{5}\s[A-Z].*$

Example:

Output should be the file names: ABC_12345.txt and 123456-ABC.pdf and 1234 NO.doc .

But the file name 12345 ABC.txt should not be output by the batch script.

Furthermore, if would be awesome if the script could export the list to file C:\\temp\\DoesNotMatch.txt .

FINDSTR can be used to filter the output of DIR to get the wanted list:

@dir /A-D /B | %SystemRoot%\System32\findstr.exe /I /R /V /C:"^[0123456789][0123456789][0123456789][0123456789][0123456789] [ABCDEFGHIJKLMNOPQRSTUVWXYZ]" >C:\temp\DoesNotMatch.txt

DIR outputs because of /AD just files (attribute not directory) in current directory in bare format because of option /B which means only file name with file extension, but without file path. Run in a command prompt window dir /? for help on this command and its options.

This output of DIR is redirected to FINDSTR with redirection operator | . Please read the Microsoft article about Using command redirection operators for details.

FINDSTR runs case-insensitive because of /I a regular expression search because of /R for lines matching the expression specified in double quotes with option /C: and outputs the inverted result because of option /V which means the lines on which the regular expression matched no string.

The option /C:"..." must be used here to specify the string being interpreted because of /R as regular expression instead of literal string as otherwise on using just "..." the space character would be interpreted as separator between two regular expression search strings which would be OR applied on each line.

The regular expression search string looks a bit strange because of the regular expression syntax supported by FINDSTR is very limited. Run in a command prompt window findstr /? for help on this command and its options and regular expression support. I recommend reading additionally SS64 - FINDSTR and What are the undocumented features and limitations of the Windows FINDSTR command?

^ ... means beginning of line which is beginning of file name because of no file path output.

[0-9] could be used, but matches also ¹ , ² , ³ . For that reason [0123456789] is used to really match only any of those 10 digit characters.

A multiplier like {5} is not supported by FINDSTR . For that reason it is necessary to write the digit character class definition five times in the search expression.

The character class \\s matching any whitespace character according to Unicode standard is not supported by FINDSTR . But vertical whitespaces are not allowed or very unusual in file names, horizontal tab character is not allowed in a file name, no-break space would be possible in a file name, but is also not very usual. And the special characters with Unicode code values U+1680, U+180E, U+2000 to U+2008 are most likely also never used in file names. So \\s can be replaced by a normal space character.

[AZ] could be used, but matches also lots of other characters like ÄäÖöÜü to list just a few here. So it is better to use [ABCDEFGHIJKLMNOPQRSTUVWXYZ] to match case-insensitive only the ASCII characters.

The output of FINDSTR is redirected with > into the file C:\\temp\\DoesNotMatch.txt which is overwritten in case of existing already on executing the batch file with this single command line.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM