简体   繁体   中英

How can I write a regular expression to match a multiline pattern?

I am trying to parse a SFTP config file (sshd_config.txt) using PowerShell to get all existing SFTP profiles. The below text is just 3 profiles, but there may be hundreds I would need to parse. Once I have a working regex to output all matches to a PowerShell object, I should have no problem parsing the individual profiles. I just can't figure out how to write the Regex. I did try using other examples, but I couldn't separate each into individual matches.

I need to match the start "^Match" and all lines until a blank line. Any help would be greatly appreciated.

Match User thedomain\user1
    ChrootDirectory E:\sftp\heatism\user1
    PermitTunnel no
    AllowAgentForwarding no
    AllowTcpForwarding no
    X11Forwarding no
    ForceCommand internal-sftp

Match User thedomain\user2
    ChrootDirectory E:\sftp\heatism\user2
    PermitTunnel no
    AllowAgentForwarding no
    AllowTcpForwarding no
    X11Forwarding no
    ForceCommand internal-sftp

Match User thedomain\user3
    ChrootDirectory E:\sftp\heatism\user3
    PermitTunnel no
    AllowAgentForwarding no
    AllowTcpForwarding no
    X11Forwarding no
    ForceCommand internal-sftp

To match across multiple lines:

  • You need a single, multi-line string as input, which is what Get-Content 's -Raw switch provides (by default, Get-Content outputs the input file's lines one by one ).

  • Additionally, your regex must use the following options :

    • Multiline ( m ), in order to make ^ and $ match the beginning and end of individual lines inside a multi-line string; \A and \Z (or \z , to match only the very end of a string, even if it is a newline) can then be used to match the overall beginning and end of the multi-line string.

    • Singleline ( s ) if you want . to also match newline characters ( \n ).

    • The simplest way to specify these options is inside (?...) at the start of the regex; in the case at hand: (?ms)

To put it all together, using a modified version of Jan's helpful regex and the .NET System.Text.RegularExpressions.Regex.Matches() method for finding all matches for a given regex (note that PowerShell's -match operator only ever looks for the first ):

foreach ($m in [regex]::Matches(
  (Get-Content -Raw sshd_config.txt), 
  '(?ms)^Match User .+?(?=^$|\Z)'
)) {
  # Split the block of line into the first and the remaining lines.  
  $first, $rest = $m.Value.Trim() -split '\r?\n'
  # Create an aux. (ordered) hashtable and fill it with the user name
  # extracted from the first line.
  $oht = [ordered] @{ UserName = (-split $first)[-1] }
  # Loop over the remaining lines, split them into name and value,
  # and add them to the hashtable.
  foreach ($line in $rest) {
    $name, $value = $line.Trim() -split ' ', 2
    $oht[$name] = $value
  }
  # Convert the hashtable to a custom object and output it.
  [pscustomobject] $oht
}

With your sample input, you'll see the following output, representing the default formatting of the [pscustomobject] instances that were created:

UserName             : thedomain\user1
ChrootDirectory      : E:\sftp\heatism\user1
PermitTunnel         : no
AllowAgentForwarding : no
AllowTcpForwarding   : no
X11Forwarding        : no
ForceCommand         : internal-sftp

UserName             : thedomain\user2
ChrootDirectory      : E:\sftp\heatism\user2
PermitTunnel         : no
AllowAgentForwarding : no
AllowTcpForwarding   : no
X11Forwarding        : no
ForceCommand         : internal-sftp

UserName             : thedomain\user3
ChrootDirectory      : E:\sftp\heatism\user3
PermitTunnel         : no
AllowAgentForwarding : no
AllowTcpForwarding   : no
X11Forwarding        : no
ForceCommand         : internal-sftp

You could use (thanks to @mklement0 for some improvements):

(?m)^Match[\s\S]+?(?=^$|\Z)

See a demo on regex101.com and mind the multiline flag.


Explanation:

^Match    # match "Match" at the very beginning of a line
[\s\S]*?  # match anything else lazily
(?=^$|\Z) # make sure what follows is a blank line or the very end of the string

Another option is to start the pattern with Match followed by the rest of the line. Then continue matching all non empty lines using a negative lookahead.

^Match .*(?:\r?\n(?!$).*)*

The pattern matches:

  • ^Match
  • .* Match the rest of the line
  • (?: Non capture group
    • \r?\n(?!$) Match a newline and assert that the line is not empty
    • .* Match the rest of the line
  • )* Close the non capture group and optionally repeat

See a regex demo .

If you also don't want to match a line that contains only spaces:

^Match .*(?:\r?\n(?![\p{Zs}\t]*$).*)*

Example

$allText = Get-Content -Raw sshd_config.txt
$pattern = "(?m)^Match .*(?:\r?\n(?![\p{Zs}\t]*$).*)*"
Select-String $pattern -input $allText -AllMatches | Foreach-Object {$_.Matches} | Foreach-Object {
    $_.Value 
    "--------------------------"
}

Output

Match User thedomain\user1
    ChrootDirectory E:\sftp\heatism\user1
    PermitTunnel no
    AllowAgentForwarding no
    AllowTcpForwarding no
    X11Forwarding no
    ForceCommand internal-sftp
--------------------------
Match User thedomain\user2
    ChrootDirectory E:\sftp\heatism\user2
    PermitTunnel no
    AllowAgentForwarding no
    AllowTcpForwarding no
    X11Forwarding no
    ForceCommand internal-sftp
--------------------------
Match User thedomain\user3
    ChrootDirectory E:\sftp\heatism\user3
    PermitTunnel no
    AllowAgentForwarding no
    AllowTcpForwarding no
    X11Forwarding no
    ForceCommand internal-sftp
--------------------------

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM