简体   繁体   中英

PowerShell regex to extract SID from filename

I have an array $vhdlist with contents similar to the following filenames:

UVHD-S-1-5-21-8746256374-654813465-374012747-4533.vhdx
UVHD-S-1-5-21-8746256374-654813465-374012747-6175.vhdx
UVHD-S-1-5-21-8746256374-654813465-374012747-8147.vhdx
UVHD-template.vhdx

I want to use a regex and be left with an array containing only SID portion of the filenames.

I am using the following:

$sids = foreach ($file in $vhdlist) 
{
[regex]::split($file, '^UVHD-(?:([(\d)(\w)-]+)).vhdx$')
}

There are 2 problems with this: in the resulting array there are 3 blank lines for every SID; and the "template" filename matches (the resulting line in the output is just "template"). How can I get an array of SIDs as the output and not include the "template" line?

You seem to want to filter the list down to those filenames that contain an SID. Filtering is done with Where-Object ( where for short); you don't need a loop.

An SID could be described as " S- and then a bunch of digits and dashes" for this simple case. That leaves us with ^UVHD-S-[\\d-]*\\.vhdx$ for the filename.

In combination we get:

$vhdlist | where { $_ -Match "^UVHD-S-[\d-]*\.vhdx$" }

When you don't really have an array of strings, but actually an array of files , use them directly.

dir C:\some\folder | where { $_.Name -Match "^UVHD-S-[\d-]*\.vhdx$" }

Or, possibly you can even make it as simple as:

dir C:\some\folder\UVHD-S-*.vhdx

EDIT

Extracting the SIDs from a list of strings can be thought as a combined transformation (for each element, extract the SID) and filter (remove non-matches) operation.

PowerShell's ForEach-Object cmdlet ( foreach for short) works like map() in other languages. It takes every input element and returns a new value. In effect it transforms a list of input elements into output elements. Together with the -replace operator you can extract SIDs this way.

$vhdlist | foreach { $_ -replace ^(?:UVHD-(S-[\d-]*)\.vhdx|.*)$,"`$1" } | where { $_ -gt "" }

The regex back-reference for .NET languages is $1 . The $ is a special character in PowerShell strings, so it needs to be escaped, except when there is no ambiguity. The backtick is the PS escape character. You can escape the $ in the regex as well, but there it's not necessary.

As a final step we use where to remove empty strings (ie non-matches). Doing it this way around means we only need to apply the regex once, instead of two times when filtering first and replacing second.

PowerShell operators can also work on lists directly. So the above could even be shortened:

$vhdlist -replace "^UVHD-(S-[\d-]*)\.vhdx$","`$1" | where { $_ -gt "" }

The shorter version only works on lists of actual strings or objects that produce the right thing when .ToString() is called on them.

Regex breakdown:

^                       # start-of-string anchor
(?:                     # begin non-capturing group (either...)
  UVHD-                 #   'UVHD-'
  (                     #   begin group 1
    S-[\d-]*            #     'S-' and however many digits and dashes
  )                     #   end group 1
  \.vhdx                #   '.vhdx'
  |                     #    ...or...
  .*                    #   anything else
)                       # end non-capturing group
$                       # end-of-string anchor

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM