简体   繁体   English

使用PowerShell,RegEx搜索在文本文件中查找和验证路径字符串

[英]Find and verify path strings in text file using PowerShell, RegEx search

First time posting here, and I'll try to be clear and detailed, but be gentle if I missed an existing answer when I searched these boards. 第一次在这里发布,我会尽量保持清晰和详细,但是如果我在搜索这些面板时错过了现有答案,请保持谦虚。

First, the issues: 一,问题:

  1. How to exclude a RegEx response that contains a specific keyword ("fastcopy") 如何排除包含特定关键字(“ fastcopy”)的RegEx响应
  2. How to include path results that do not end in a file name/wildcard 如何包括不以文件名/通配符结尾的路径结果

I am working with a set of text files that are very similar to batch files. 我正在处理一组与批处理文件非常相似的文本文件。 They are plain text, and contain header lines, lines containing paths to files on a server, and comment lines. 它们是纯文本,包含标题行,包含服务器上文件路径的行和注释行。 Commented lines begin with a semicolon (;), so that is simple enough to rule out. 带注释的行以分号(;)开头,因此很容易排除。 The paths should all start with a variable %INSTDIR%, but they may or may not have quotes surrounding the path, and they may or may not have execution options following the path. 路径都应以变量%INSTDIR%开头,但是路径的引号可能包含或不包含引号,并且路径后可能包含执行选项,也可能不包含执行选项。 One last note... the company uses FastCopy.exe to dump files/folders down from the network, and in such a line I would like to return the folder/file being copied instead of the path containing fastcopy.exe. 最后一点...该公司使用FastCopy.exe从网络上向下转储文件/文件夹,在这一行中,我想返回要复制的文件夹/文件,而不是包含fastcopy.exe的路径。

Here is a sample (kind of large to show potential issues): 这是一个示例(有点大,以显示潜在问题):

[Installing .NET 3.5 Hotfix KB943326 for App1]
; *** Added NET 3.5 SP1 hotfix KB943326: resolves App1 hidden menus force laptop re-booting
1 = %INSTDIR%\ToolShare$\Sample_Toolbox\applications\.NET_3.5_Hotfix_KB943326\WindowsXP-KB943326-x86-ENU.exe /quiet /norestart

[Installing Agent 5.3.1]
1 = %INSTDIR%\ToolShare$\Sample_Toolbox\applications\AGenT_531_2.0\w7wxp_ze_20\install.exe

[Installing APR Manager 2.1]
1 = %INSTDIR%\ToolShare$\Sample_Toolbox\applications\APRManager_21_Updated_2.0\wviwxp_ze_20\install.exe

[Installing Scope Simulator]
1 = MD "C:\Temp\scope_simulator_10"
2 =  start /wait /high %INSTDIR%\ToolShare$\Site_Toolbox\Custom_Scripts\Source\fastcopy.exe /auto_close /no_confirm_del /no_confirm_stop /log=FALSE /open_window /force_start /force_close /stream=FALSE /cmd=diff "%INSTDIR%\ToolShare$\Sample_Toolbox\applications\scope_simulator_10" /to="C:\Temp\scope_simulator_10"
3 = "C:\Temp\scope_simulator_10\w7wxp_ze_10\Install.exe"
4 = RD "C:\temp\scope_simulator_10" /q /s

[Installing Log Analyzer Offline 2.6.1]
1 = %INSTDIR%\ToolShare$\Sample_Toolbox\applications\Log_Analyzer_Offline_261\wxp_ze_10\install.exe

[Installing Data Migration Script]
1 = MD "C:\Temp\Data Migration"
2 = xcopy "%INSTDIR%\ToolShare$\Sample_Toolbox\Support\Data Migration\*.*" "C:\Temp\Data Migration" /y /e
3 = xcopy "%INSTDIR%\ToolShare$\Sample_Toolbox\Support\Data Migration\Data Migration.lnk" C:\DOCUME~1\ALLUSE~1\Desktop\ /Y

I have it set to pull a 'dir \\\\UNCPath\\*.ini' and then loop through that doing a ForEach ($INI in $Results) bit. 我将其设置为拉'dir \\\\ UNCPath \\ *。ini',然后遍历该字段,进行ForEach($ Results中的$ INI)位。 The line that I have been using inside the loop to try and pull the paths from each line is: 我在循环内部尝试从每行中提取路径的行是:

gc $ini|?{!($_ -match "^;") -and ($_ -match "%INST[^`"]*?\\.*(\.\w{3}|\.\*)(?=`"|\s|\Z)")}|%{$TestPath = $Matches[0].replace("%INSTDIR%","\\ServerName1");if(test-path $testpath){write-host "  [OK]    " -foregroundcolor Green -NoNewline}else{write-host "[Missing] " -ForegroundColor red -NoNewline};write-host "$testpath"}

This gets me almost everything I could want. 这几乎给了我所有我想要的东西。 What it doesn't do is get anything that does not end in either a .* or standard 3 character extension (.exe, .cmd, .jar etc). 它不执行的操作是得到未以。*或标准3个字符扩展名结尾的任何内容(.exe,.cmd,.jar等)。 Plus it kicks back the fastcopy path instead of the path that it being attempted to be copied. 另外,它会弹出快速复制路径,而不是试图复制的路径。

What I would like for results: 我想要什么结果:

%INSTDIR%\ToolShare$\Sample_Toolbox\applications\.NET_3.5_Hotfix_KB943326\WindowsXP-KB943326-x86-ENU.exe
%INSTDIR%\ToolShare$\Sample_Toolbox\applications\AGenT_531_2.0\w7wxp_ze_20\install.exe
%INSTDIR%\ToolShare$\Sample_Toolbox\applications\APRManager_21_Updated_2.0\wviwxp_ze_20\install.exe
%INSTDIR%\ToolShare$\Sample_Toolbox\applications\scope_simulator_10
%INSTDIR%\ToolShare$\Sample_Toolbox\applications\Log_Analyzer_Offline_261\wxp_ze_10\install.exe
%INSTDIR%\ToolShare$\Sample_Toolbox\Support\Data Migration\*.*
%INSTDIR%\ToolShare$\Sample_Toolbox\Support\Data Migration\Data Migration.lnk

I do not get the second result (instead I get the FastCopy path, but even if I strip Fastcopy from the line and only have the desired path it won't return it). 我没有得到第二个结果(相反,我得到了FastCopy路径,但是即使我从该行中删除Fastcopy并仅具有所需的路径,它也不会返回它)。 Any suggestions are welcome. 欢迎任何建议。

The following script should work just fine. 以下脚本应该可以正常工作。

$paths = Get-Content $ini | Foreach {
    if ($_ -match "^(?=[^;]).*?(?<delimiter>[""' ])(?<path>%INSTDIR%(?!.*?fastcopy.exe).*?)(?:\1|$)")
    {
        Write-Output $Matches["path"]
    }
}

The $paths variable will now contain all the paths requested. $paths变量现在将包含所有请求的路径。 Observe that if any string contains the "fastcopy.exe" literal string anywhere in the path it will not be found by this regular expression. 请注意,如果任何字符串在路径中的任何位置包含“ fastcopy.exe”文字字符串,则此正则表达式将找不到该字符串。

An attempt to explaining the regular expression: 尝试解释正则表达式:

^ - match the start of the line
(?=[^;]) - positive lookahead verifying that the line does not start with a semicolon
.*? - any character, as few as possible (to remove all characters before the path we want to match)
(?<delimiter>["' ]) - named group verifying whether the path is surrounded by space, a quotation character or a apostrophe.
(?<path> - start a named capturing group for capturing the "path"
    %INSTDIR% - matches the literal string '%INSTDIR%'
    (?!.*?fastcopy.exe) - negative lookahead verifying that the part of the line we're trying to match (which has started with %INSTDIR%) doesn't contain the word fastcopy.exe anywhere later in the string (the second time the %INSTDIR% occurs on the fastcopy line, the rest of the line does not contain the fastcopy.exe literal string).
    .*? - matches any character, as few as possible, to make sure that we stop as soon as we find a matching delimiter character below
) - ends the named capturing group "path"
(?:\1|$) - matches (in a non-capturing group) the character found by the delimiter group above (to match a quotation character, apostrophe or space, depending on what character was immediately before the %INSTDIR% literal string), or the end of the line.

If anything is unclear, please add a comment below asking for clarifications. 如果有任何不清楚的地方,请在下面添加评论以进行说明。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM