简体   繁体   中英

Split Period-Delimited Nodes To JSON Object

I have many string entries (this are namespace/class trees) that look like the following:

appsystem
appsystem.applications
appsystem.applications.APPactivities
appsystem.applications.APPmanager
appsystem.applications.APPmodels
appsystem.applications.MAPmanager
appsystem.applications.MAPmanager.maphub
appsystem.applications.MAPmanager.mapmanager
appsystem.applications.pagealertsmanager
appsystem.authentication
appsystem.authentication.manager
appsystem.authentication.manager.encryptionmanager
appsystem.authentication.manager.sso
appsystem.authentication.manager.tokenmanager

But, I need the final output to be like:

{
    "name": "appsystem",
    "children": [
        {
        "name": "applications",
        "children": [
            {"name": "APPactivities"},
            {"name": "APPmanager"},
            {"name": "APPmodels"},
            {"name": "MAPmanager",
                "children": [
                    {"name": "maphub"},
                    {"name": "mapmanager"}

                ]},
            {"name": "pagealertsmanager"}   
            ]
        },
        {
        "name": "authentication",
        "children": [
            {"name": "manager",
                "children": [
                    {"name": "encryptionmanager"},
                    {"name": "sso"},
                    {"name": "tokenmanager"}
                ]}
            ]
        }
    ]
}

The total nodes can be any number.

I am assuming I am going to need recursion but I am at a loss on where even to begin.

This builds up nested lists, PowerShell ConvertTo-JSON flattens the outer list.

You can change the $Line in $s to $line in (Get-Content input.txt) .

But I think this does it:

$s = @'
appsystem
appsystem.applications
appsystem.applications.APPactivities
appsystem.applications.APPmanager
appsystem.applications.APPmodels
appsystem.applications.MAPmanager
appsystem.applications.MAPmanager.maphub
appsystem.applications.MAPmanager.mapmanager
appsystem.applications.pagealertsmanager
appsystem.authentication
appsystem.authentication.manager
appsystem.authentication.manager.encryptionmanager
appsystem.authentication.manager.sso
appsystem.authentication.manager.tokenmanager
'@ -split "`r`n"

$TreeRoot = New-Object System.Collections.ArrayList

foreach ($Line in $s) {

    $CurrentDepth = $TreeRoot

    $RemainingChunks = $Line.Split('.')
    while ($RemainingChunks)
    {

        # If there is a dictionary at this depth then use it, otherwise create one.
        $Item = $CurrentDepth | Where-Object {$_.name -eq $RemainingChunks[0]}
        if (-not $Item)
        {
            $Item = @{name=$RemainingChunks[0]}
            $null = $CurrentDepth.Add($Item)
        }

        # If there will be child nodes, look for a 'children' node, or create one.
        if ($RemainingChunks.Count -gt 1)
        {
            if (-not $Item.ContainsKey('children'))
            {
                $Item['children'] = New-Object System.Collections.ArrayList
            }

            $CurrentDepth = $Item['children']
        }

        $RemainingChunks = $RemainingChunks[1..$RemainingChunks.Count]
    }
}

$TreeRoot | ConvertTo-Json -Depth 1000

Edit: It's too slow? I tried some random pausing profiling and found (not too surprisingly) that it's the inner nested loop, which searches children arrays for matching child nodes, which is being hit too many times.

This is a redesigned version which still builds the tree, and this time it also builds a TreeMap hashtable of shortcuts into the tree, to all the previously build nodes, so it can jump right too them instead of searching the children lists for them.

I made a testing file, some 20k random lines. Original code processed it in 108 seconds, this one does it in 1.5 seconds and the output matches.

$TreeRoot = New-Object System.Collections.ArrayList
$TreeMap = @{}

foreach ($line in (Get-Content d:\out.txt)) {

    $_ = ".$line"    # easier if the lines start with a dot

    if ($TreeMap.ContainsKey($_))    # Skip duplicate lines
    { 
        continue
    }

    # build a subtree from the right. a.b.c.d.e -> e  then  d->e  then  c->d->e
    # keep going until base 'a.b' reduces to something already in the tree, connect new bit to that.
    $LineSubTree = $null
    $TreeConnectionPoint = $null

    do {
        $lastDotPos = $_.LastIndexOf('.')
        $leaf = $_.Substring($lastDotPos + 1)
        $_ = $_.Substring(0, $lastDotPos)

        # push the leaf on top of the growing subtree
        $LineSubTree = if ($LineSubTree) {
                           @{"name"=$leaf; "children"=([System.Collections.ArrayList]@($LineSubTree))}
                       } else { 
                           @{"name"=$leaf}
                       }

        $TreeMap["$_.$leaf"] = $LineSubTree

    } while (!($TreeConnectionPoint = $TreeMap[$_]) -and $_)


    # Now we have a branch built to connect in to the existing tree
    # but is there somewhere to put it?
    if ($TreeConnectionPoint)
    {
        if ($TreeConnectionPoint.ContainsKey('children'))
        {
            $null = $TreeConnectionPoint['children'].Add($LineSubTree)
        } else {
            $TreeConnectionPoint['children'] = [System.Collections.ArrayList]@($LineSubTree)
        }
    } else
    {           # nowhere to put it, this is a new root level connection
        $null = $TreeRoot.Add($LineSubTree)
    }
}

$TreeRoot | ConvertTo-Json -Depth 100

(@mklement0's code takes 103 seconds and produces a wildly different output - 5.4M characters of JSON instead of 10.1M characters of JSON. [Edit: because my code allows multiple root nodes in a list which my test file has, and their code does not allow that])


Auto-generated PS help links from my codeblock (if available):

To complement TessellatingHeckler's great answer with an alternative implementation that uses a recursive function .

The emphasis is on modularity and terseness, not performance. [1]

# Outer function that loops over all paths and builds up a one or more nested
# hashtables reflecting the path hierarchy, which are converted to JSON on output.
# Note that only a single JSON object is output if all paths share the same root
# component; otherwise, a JSON *array* is output.
function convert-PathsToNestedJsonObject([string[]] $paths) {
  $hts = New-Object Collections.ArrayList
  $paths.ForEach({
    $rootName = $_.split('.')[0] 
    $ht = $hts.Where({ $_.name -eq $rootName }, 'First')[0]
    if (-not $ht) { [void] $hts.Add(($ht = @{})) }
    convert-PathToNestedHashtable $ht $_ 
  })
  $hts | ConvertTo-Json -Depth 100
}

# Recursive helper function that takes a path such as "appsystem.applications"
# and converts it into a nested hashtable with keys "name" and "children" to
# reflect the path hierarchy. 
function convert-PathToNestedHashtable([hashtable] $ht, [string] $path) {
  $name, $childName, $rest = $path -split '\.', 3
  $ht.name = $name
  if ($childName) {
    if ($ht.children) { 
      $htChild = $ht.children.Where({ $_.name -eq $childName }, 'First')[0]
    } else {
      $ht.children = New-Object Collections.ArrayList
      $htChild = $null
    }
    if (-not $htChild) {      
      [void] $ht.children.Add(($htChild = @{}))
    }
    convert-PathToNestedHashtable $htChild "$childName.$rest" 
  }
}

# Call the outer function with the input paths (assumed to be stored in $paths).
convert-PathsToNestedJsonObject $paths

[1] One deliberate type of optimization is applied, which, however, still keeps the code terse:

PSv4+ offers the (little-known) array methods .ForEach() and .Where() , which are not only noticeably faster than their cmdlet counterparts ForEach-Object and Where-Object , but also offer additional features.

Specifically:

  • $paths.ForEach({ ... }) is used instead of
    $paths | ForEach-Object { ... }

  • $ht.children.Where({ $_.name -eq $childName }, 'First')[0] is used instead of
    $ht.children | Where-Object { $_.name -eq $childName } | Select-Object -First 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM