简体   繁体   English

使用 Powershell 脚本从文本文件中提取循环中的键值对

[英]Extracting the key-value pair in loop from text file using Powershell Script

I am trying to capture the specific key value pairs from a text file having other data as well than key:value pattern using powershell.我正在尝试使用 powershell 从具有其他数据以及 key:value 模式的文本文件中捕获特定的键值对。 Can anyone help me out?谁能帮我吗? I have tried the code so far with the help of internet as I am newbie to Powershell.到目前为止,我已经在互联网的帮助下尝试了这些代码,因为我是 Powershell 的新手。 Any help will be appreciated.任何帮助将不胜感激。

Source Text sample:源文本示例:

ResourceGroupName    : DataLake-Gen2
DataFactoryName      : dna-production-gen2
TriggerName          : TRG_RP_Optimizely_Import
TriggerRunId         : 08586050680855766354964895535CU57
TriggerType          : ScheduleTrigger
TriggerRunTimestamp  : 8/4/2020 10:59:59 AM
Status               : Succeeded
TriggeredPipelines   : {[PL_DATA_OPTIMIZELY_MART, 1f89fc3a-27b5-442e-9685-a444f751f607]}
Message              :
Properties           : {[TriggerTime, 8/4/2020 10:59:59 AM], [ScheduleTime, 8/4/2020 11:00:00 AM], [triggerObject, {
                         "name": "Trigger_421B8CAF-BE66-42CF-83DA-E3028693F304",
                         "startTime": "2020-08-04T10:59:59.8982174Z",
                         "endTime": "2020-08-04T10:59:59.8982174Z",
                         "scheduledTime": "2020-08-04T11:00:00Z",
                         "trackingId": "fdf58bb2-ecd5-4fe9-b2ef-d94fd71729c3",
                         "clientTrackingId": "08586050680855766354964895535CU57",
                         "originHistoryName": "08586050680855766354964895535CU57",
                         "code": "OK",
                         "status": "Succeeded"
                       }]}
AdditionalProperties : {[groupId, 08586050680855766354964895535CU57]}

ResourceGroupName    : DataLake-Gen2
DataFactoryName      : dna-production-gen2
TriggerName          : TRG_RP_Optimizely_Import
TriggerRunId         : 08586049816852049265494275953CU24
TriggerType          : ScheduleTrigger
TriggerRunTimestamp  : 8/5/2020 11:00:00 AM
Status               : Succeeded
TriggeredPipelines   : {[PL_DATA_OPTIMIZELY_MART, dd6b5beb-b7f6-44ef-8903-34c845003dfc]}
Message              :
Properties           : {[TriggerTime, 8/5/2020 11:00:00 AM], [ScheduleTime, 8/5/2020 11:00:00 AM], [triggerObject, {
                         "name": "Trigger_421B8CAF-BE66-42CF-83DA-E3028693F304",
                         "startTime": "2020-08-05T11:00:00.2662252Z",
                         "endTime": "2020-08-05T11:00:00.2662252Z",
                         "scheduledTime": "2020-08-05T11:00:00Z",
                         "trackingId": "ba223bbd-8cb2-40e8-951f-87130dbbbfe8",
                         "clientTrackingId": "08586049816852049265494275953CU24",
                         "originHistoryName": "08586049816852049265494275953CU24",
                         "code": "OK",
                         "status": "Succeeded"
                       }]}
AdditionalProperties : {[groupId, 08586049816852049265494275953CU24]}

Code used so far:到目前为止使用的代码:

[CmdletBinding()]
Param(
    [Parameter(Mandatory=$true)]
    $path
)

function Format-LogFile {
    [CmdletBinding()]
    param (
        $log
    )

    $targets = 'TriggerRunTimestamp','ResourceGroupName', 'DataFactoryName', 'TriggerName', 'TriggerRunId', 'TriggerType', 'Status'
    [System.Collections.ArrayList]$lines = @()
    $log | ForEach-Object {
        $line = $_
        $targets | ForEach-Object {
            if ($line.Contains($_) -and $line -notin $lines) {
                $lines.Add($line) | Out-Null
            }
        }
    }
#    $lines[0] = $lines[0].TrimStart("JournalSMS  ")
#    return $lines
    
}


function Get-LogFields {
    [CmdletBinding()]
    param (

        $lines
    )
    $targets = 'TriggerRunTimestamp','ResourceGroupName', 'DataFactoryName', 'TriggerName', 'TriggerRunId', 'TriggerType', 'Status'
    $matchs = $lines | Select-String -Pattern "(?<=(\s||\b))[A-Z][\s\[A-Z]/]+?\s*?\:\s+[^\s\b]+" -AllMatches 
    
    $dict = @{}
    $matchs.Matches | ForEach-Object {
        $val = $_.Value
        $arr = $val.Split("")
        if ($arr[0].Trim() -in $targets)  {
            $dict.Add($arr[0].Trim(), $arr[1].Trim())
        } 
    }
    
    return $dict
}


$log = get-content 'D:\\output.txt'
$path = "D:\\output.txt"
$info = Get-ChildItem -File -Recurse -Path $path | ForEach-Object {
    $log = Get-Content $_.FullName -Encoding Default
    $lines = Format-LogFile $log
    $dict = Get-LogFields $lines
    $values = New-Object -TypeName psobject -Property $dict
    return $values
} 



# $info |
# Select-Object   @{name='TriggerRunTimestamp';expression={$_.'TriggerRunTimestamp'}},
#                 @{name='ResourceGroupName';expression={$_."ResourceGroupName"}},
#                 @{name='DataFactoryName';expression={$_.'DataFactoryName'}},
#                 @{name='TriggerName';expression={$_.'TriggerName'}},
#                 @{name='TriggerRunId';expression={$_.'TriggerRunId'}} 
#                  @{name='TriggerType';expression={$_.'TriggerType'}}
#                 @{name='Status';expression={$_.'Status'}}|
# Export-Csv -Encoding UTF8 -Path .\result.csv -Force


$info |
Select-Object   'TriggerRunTimestamp', "ResourceGroupName", 'DataFactoryName',
                'TriggerName', 'TriggerRunId', 'TriggerType', 'Status' |
ConvertTo-CSV -Delimiter ";" -NoTypeInformation |
% {$_.Replace('"','')} |
Set-Content -Path 'D:\\result.csv' -Force
# Export-Csv -Encoding UTF8 -Path .\result.csv -Force

Expected Output:预期输出:

TriggerRunTimestamp ResourceGroupName DataFactoryName TriggerName TriggerRunId TriggerType Status TriggeredPipeline Properties_TriggerTime Properties_ScheduleTime triggerObject_name triggerObject_startTime triggerObject_endTime triggerObject_scheduledTime 8/4/2020 10:59 DataLake-Gen2 dna-production-gen2 TRG_RP_Optimizely_Import 08586050680855766354964895535CU57 ScheduleTrigger Succeeded PL_DATA_OPTIMIZELY_MART 8/4/2020 10:59 8/4/2020 11:00 Trigger_421B8CAF-BE66-42CF-83DA-E3028693F304 2020-08-04T10:59:59.8982174Z 2020-08-04T10:59:59.8982174Z 2020-08-04T11:00:00Z TriggerRunTimestamp ResourceGroupName DataFactoryName TriggerName TriggerRunId TriggerType状态TriggeredPipeline Properties_TriggerTime Properties_ScheduleTime triggerObject_name triggerObject_startTime triggerObject_endTime triggerObject_scheduledTime 2020年8月4日10:59 DataLake -第二代DNA的生产第二代TRG_RP_Optimizely_Import 08586050680855766354964895535CU57 ScheduleTrigger成功PL_DATA_OPTIMIZELY_MART 2020年8月4日10:59 2020年8月4日11: 00 触发器_421B8CAF-BE66-42CF-83DA-E3028693F304 2020-08-04T10:59:59.8982174Z 2020-08-04T10:59:59.898202040Z102:59.8982170Z102

NOTE: Bold values are the column headers and values are in plain text.注意:粗体值是列标题,值是纯文本。

Help Much Needed !!急需帮助!!

Thanks谢谢

The problematic part in this log file is on property Properties , which is a JSON string.此日志文件中的问题部分位于属性Properties ,它是一个 JSON 字符串。 Luckily, you don't want any of this in your output CSV file, so the below should work:幸运的是,您不希望输出 CSV 文件中的任何内容,因此以下内容应该可以工作:

# read the file as a single, multiline string using the -Raw switch
$log = Get-Content -Path 'D:\Test\the_input_log.txt' -Raw
# split the content into several blocks on the empty line, skip blocks that do not contain text
$result = ($log -split '(\r?\n){2,}' | Where-Object {$_ -match '\S'}) | ForEach-Object {
    # split the block to get only the part with the properties you are interested in
    # replace ' : ' into an equals sign (mind the extra spaces around the colon, otherwise
    # you will also replace the colons in the 'TriggerRunTimestamp' property.

    # use ConvertFrom-StringData cmdlet to create a Hashtable from this and convert that to a PsCustomObject
    # finally, use Select-Object to output a new PSObject with only the properties you need in the wanted order.
    [PsCustomObject](($_ -split 'TriggeredPipelines')[0] -replace ' : ', '=' | ConvertFrom-StringData)  |
    Select-Object 'TriggerRunTimestamp', 'ResourceGroupName', 'DataFactoryName', 'TriggerName', 'TriggerRunId', 'TriggerType', 'Status'
}

# output on screen
$result | Format-Table -AutoSize

# write to CSV file
$result | Export-Csv -Path 'D:\Test\result.csv' -Encoding UTF8 -NoTypeInformation -Force

I have added quite a few comments in the code for you to hopefully make it understandable what is going on in there.我在代码中添加了相当多的注释,希望您可以理解其中发生的事情。

The resulting CSV file will contain quotes:生成的 CSV 文件包含引号:

"TriggerRunTimestamp","ResourceGroupName","DataFactoryName","TriggerName","TriggerRunId","TriggerType","Status"
"8/4/2020 10:59:59 AM","DataLake-Gen2","dna-production-gen2","TRG_RP_Optimizely_Import","08586050680855766354964895535CU57","ScheduleTrigger","Succeeded"
"8/5/2020 11:00:00 AM","DataLake-Gen2","dna-production-gen2","TRG_RP_Optimizely_Import","08586049816852049265494275953CU24","ScheduleTrigger","Succeeded"

If you absolutely do not want quotes and you are using PowerShell version 7, you can add -UseQuotes AsNeeded to the Export-Csv cmdlet.如果您绝对不想要引号并且您使用的是 PowerShell 版本 7,则可以将-UseQuotes AsNeeded添加到 Export-Csv cmdlet。

For older PowerShell versions, you can use my function ConvertTo-CsvNoQuotes对于较旧的 PowerShell 版本,您可以使用我的函数ConvertTo-CsvNoQuotes


Edit编辑

As per your comment, you also need properties from the (what seemed to be JSON) elements, you would need a completely different approach.根据您的评论,您还需要来自(似乎是JSON)元素的属性,您将需要一种完全不同的方法。

For the example you have given you can use:对于您给出的示例,您可以使用:

# read the file as a single, multiline string using the -Raw switch
$log = Get-Content -Path 'D:\Test\the_input_log.txt' -Raw
# split the content into several blocks on the empty line, skip blocks that do not contain text
$result = ($log -split '(\r?\n){2,}' | Where-Object {$_ -match '\S'}) | ForEach-Object {
    # create a Hashtable to store the key/value properties we find looping over each line in the block
    $hash = @{}
    switch -Regex ($_.Trim() -split '\r?\n') {
        '^(\w+)\s+:\s*(.*)' { $key = $matches[1]; $hash[$key] = $matches[2] }     # found a key/value property
        '^\s+(\S.+)'        { if ($key) {$hash[$key] += ("`r`n"+ $matches[1])} }  # add to a multiline property
    }
    # test if the above actually was able to parse 'TriggeredPipelines'
    if (![string]::IsNullOrWhiteSpace($hash['TriggeredPipelines'])) {
    # remove the brackets from TriggeredPipelines
        $hash['TriggeredPipeline'] = ($hash['TriggeredPipelines'].Trim("{[]}") -split ',')[0]
    }

    # test if the above actually was able to parse 'Properties'
    if (![string]::IsNullOrWhiteSpace($hash['Properties'])) {
        # the 'Properties' property needs a bit more work:
        # 1) remove the surrounding brackets, split into the first line and a textblock with the rest of the properties
        $props = $hash['Properties'].Trim("{[ ]}") -split '\r?\n', 2
        # $props[0] is now "[TriggerTime, 8/4/2020 10:59:59 AM], [ScheduleTime, 8/4/2020 11:00:00 AM], [triggerObject, {"
        # parse the TriggerTime and ScheduleTime from that line and add them to the hash
        $temp  = ([regex]'(?i)TriggerTime,\s*([^\]]+)').Matches($props[0]).Groups[1].Value
        if (![string]::IsNullOrWhiteSpace($temp)) { $hash['Properties_TriggerTime'] = $temp }

        $temp = ([regex]'(?i)ScheduleTime,\s*([^\]]+)').Matches($props[0]).Groups[1].Value
        if (![string]::IsNullOrWhiteSpace($temp)) { $hash['Properties_ScheduleTime'] = $temp }

        if ($props.Count -eq 2) {
            # 2) surround $props[1] with curly brackets, so it will become valid JSON and convert from that
            $props = '{{{0}}}' -f $props[1] | ConvertFrom-Json
            # loop through the properties and add these to the hash with "TriggerObject_" prefix
            foreach($p in $props.PSObject.Properties.name) {
                $hash["TriggerObject_$p"] = $props.$p
            }
        }
    }

    # final test to see if we have managed to capture anything
    # more strict but memory consuming would be 
    # if ($hash.Count -and ![string]::IsNullOrWhiteSpace(-join $hash.Values)) {..}

    if ($hash.Count) {
        # convert the completed hash into a PSObject and select the properties you need from it
        [PsCustomObject]$hash | Select-Object 'TriggerRunTimestamp', 'ResourceGroupName', 'DataFactoryName',
                                              'TriggerName', 'TriggerRunId', 'TriggerType', 'Status', 
                                              'TriggeredPipeline', 'Properties_TriggerTime', 'Properties_ScheduleTime',
                                              'TriggerObject_name', 'TriggerObject_startTime', 
                                              'TriggerObject_endTime', 'TriggerObject_scheduledTime'
    }
}

# output on screen (won't fit as Table in the console)
$result

# write to CSV file
$result | Export-Csv -Path 'D:\Test\result.csv' -Encoding UTF8 -NoTypeInformation -Force

The resulting CSV file will now look like生成的 CSV 文件现在看起来像

"TriggerRunTimestamp","ResourceGroupName","DataFactoryName","TriggerName","TriggerRunId","TriggerType","Status","TriggeredPipeline","Properties_TriggerTime","Properties_ScheduleTime","TriggerObject_name","TriggerObject_startTime","TriggerObject_endTime","TriggerObject_scheduledTime"
"8/4/2020 10:59:59 AM","DataLake-Gen2","dna-production-gen2","TRG_RP_Optimizely_Import","08586050680855766354964895535CU57","ScheduleTrigger","Succeeded","PL_DATA_OPTIMIZELY_MART","8/4/2020 10:59:59 AM","8/4/2020 11:00:00 AM","Trigger_421B8CAF-BE66-42CF-83DA-E3028693F304","2020-08-04T10:59:59.8982174Z","2020-08-04T10:59:59.8982174Z","2020-08-04T11:00:00Z"
"8/5/2020 11:00:00 AM","DataLake-Gen2","dna-production-gen2","TRG_RP_Optimizely_Import","08586049816852049265494275953CU24","ScheduleTrigger","Succeeded","PL_DATA_OPTIMIZELY_MART","8/5/2020 11:00:00 AM","8/5/2020 11:00:00 AM","Trigger_421B8CAF-BE66-42CF-83DA-E3028693F304","2020-08-05T11:00:00.2662252Z","2020-08-05T11:00:00.2662252Z","2020-08-05T11:00:00Z"

powershell 截图

In PowerShell 5 ( I don't know about lower versions ).在 PowerShell 5 中(我不知道较低版本)。 We can use -Match comparison to break a string having a pattern into "Key" and "Value" pair.我们可以使用 -Match 比较将具有模式的字符串分解为“键”和“值”对。 Mostly, the need comes when working with JSON objects.大多数情况下,需要在使用 JSON 对象时出现。


PS C:\Users> $str = '"KeyStr": "ValueString"'
PS C:\Users> $str -match '(?<key>.+):(?<value>.+)'
True
PS C:\Users> # $Matches is inbuilt variable in PowerShell
PS C:\Users> $Matches

Name                           Value
----                           -----
key                            "KeyStr"
value                           "ValueString"
0                              "KeyStr": "ValueString"


PS C:\Users> $Matches.GetType()

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     Hashtable                                System.Object


PS C:\Users> $Matches.key
"KeyStr"
PS C:\Users> $Matches.Value
 "ValueString"
PS C:\Users>
------------------------------------------

For more help, check PowerShell help如需更多帮助,请查看 PowerShell 帮助

"Get-Help about_Comparison_Operators" “获取帮助 about_Comparison_Operators”

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM