繁体   English   中英

Powershell-使用嵌套JSON优化CSV处理

[英]Powershell - Optimize CSV processing with nested JSON

我既不是开发人员,也不是CSV专家。 我能够将这段代码放在一起,并且可以完成工作。 为了让您快速浏览,我需要处理一些嵌套在CSV中的JSON数据。 因此,我正在阅读JSON,并将其拆分为额外的列,然后保存了CSV。

现在,我的问题是,尽管此方法工作正常,但我现在需要处理一个1.5Gb CSV文件,并且我不希望处理花费2天的时间...

因此,如果你们可以帮助我调整脚本,使其在合理的时间内运行,我将不胜感激:)

$file = Get-Content -Path 'input.csv' | Select-Object -Skip 2 | ConvertFrom-Csv
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_StreamingEndpointName -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_Id -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_AppServicePlanUri -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_ImageType -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_ServiceType -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_VMName -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_UsageType -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_DatabaseAccount -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_CollectionRid -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_ResourceCategory -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_displayName -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_ACCESSED-VIA-INTERNET' -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_APP-NAME' -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_APP-TYPE' -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_APPTYPE -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_CHARGECODE -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_COMMENTS -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_COUNTRY -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_EY-REGION' -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_IT-ENV' -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_OWNER -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_OWNER-EMAIL' -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_SERVICELINE -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_SUB-TYPE' -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_TECHCONTACTS -value $null

$count=1
ForEach ($line in $file) {
Write-Output "Processing line: $count"
$count++
try{
    if ($line.AdditionalInfo -ne $null -Or $line.Tags -ne $null){
        $line.AdditionalInfo_StreamingEndpointName = ($line.AdditionalInfo | ConvertFrom-JSON).StreamingEndpointName
        $line.AdditionalInfo_Id = ($line.AdditionalInfo | ConvertFrom-JSON).Id
        $line.AdditionalInfo_AppServicePlanUri = ($line.AdditionalInfo | ConvertFrom-JSON).AppServicePlanUri
        $line.AdditionalInfo_ImageType = ($line.AdditionalInfo | ConvertFrom-JSON).ImageType
        $line.AdditionalInfo_ServiceType = ($line.AdditionalInfo | ConvertFrom-JSON).ServiceType
        $line.AdditionalInfo_VMName = ($line.AdditionalInfo | ConvertFrom-JSON).VMName
        $line.AdditionalInfo_UsageType = ($line.AdditionalInfo | ConvertFrom-JSON).UsageType
        $line.AdditionalInfo_DatabaseAccount = ($line.AdditionalInfo | ConvertFrom-JSON).DatabaseAccount
        $line.AdditionalInfo_CollectionRid = ($line.AdditionalInfo | ConvertFrom-JSON).CollectionRid
        $line.AdditionalInfo_ResourceCategory = ($line.AdditionalInfo | ConvertFrom-JSON).ResourceCategory
        $line.Tags_displayName = ($line.Tags | ConvertFrom-JSON).displayName
        $line.'Tags_ACCESSED-VIA-INTERNET' = ($line.Tags | ConvertFrom-JSON).'ACCESSED-VIA-INTERNET'
        $line.'Tags_APP-NAME' = ($line.Tags | ConvertFrom-JSON).'APP-NAME'
        $line.'Tags_APP-TYPE' = ($line.Tags | ConvertFrom-JSON).'APP-TYPE'
        $line.Tags_APPTYPE = ($line.Tags | ConvertFrom-JSON).APPTYPE
        $line.Tags_CHARGECODE = ($line.Tags | ConvertFrom-JSON).CHARGECODE
        $line.Tags_COMMENTS = ($line.Tags | ConvertFrom-JSON).COMMENTS
        $line.Tags_COUNTRY = ($line.Tags | ConvertFrom-JSON).COUNTRY
        $line.'Tags_EY-REGION' = ($line.Tags | ConvertFrom-JSON).'EY-REGION'
        $line.'Tags_IT-ENV' = ($line.Tags | ConvertFrom-JSON).'IT-ENV'
        $line.Tags_OWNER = ($line.Tags | ConvertFrom-JSON).OWNER
        $line.'Tags_OWNER-EMAIL' = ($line.Tags | ConvertFrom-JSON).'OWNER-EMAIL'
        $line.Tags_SERVICELINE = ($line.Tags | ConvertFrom-JSON).SERVICELINE
        $line.'Tags_SUB-TYPE' = ($line.Tags | ConvertFrom-JSON).'SUB-TYPE'
        $line.Tags_TECHCONTACTS = ($line.Tags | ConvertFrom-JSON).TECHCONTACTS
        }
    }
    catch {}
}

#write-output $info
$file | Export-Csv 'C:\output.csv' -NoTypeInformation

Add-Member性能很差。 一遍又一遍地将所有内容保存到变量中也是如此。 您可能更幸运的是将所有内容保存在单个管道中,并使用具有计算属性的Select-Object

Get-Content -Path 'input.csv' | 
    Select-Object -Skip 2 | 
    ConvertFrom-Csv | 
    Select-Object -ErrorAction SilentlyContinue -Property *,
        @{n = 'AdditionalInfo_StreamingEndpointName' ; e = {($_.AdditionalInfo | ConvertFrom-JSON).StreamingEndpointName}},
        @{n = 'AdditionalInfo_Id'                    ; e = {($_.AdditionalInfo | ConvertFrom-JSON).Id}},
        @{n = 'AdditionalInfo_AppServicePlanUri'     ; e = {($_.AdditionalInfo | ConvertFrom-JSON).AppServicePlanUri}},
        @{n = 'AdditionalInfo_ImageType'             ; e = {($_.AdditionalInfo | ConvertFrom-JSON).ImageType}},
        @{n = 'AdditionalInfo_ServiceType'           ; e = {($_.AdditionalInfo | ConvertFrom-JSON).ServiceType}},
        @{n = 'AdditionalInfo_VMName'                ; e = {($_.AdditionalInfo | ConvertFrom-JSON).VMName}},
        @{n = 'AdditionalInfo_UsageType'             ; e = {($_.AdditionalInfo | ConvertFrom-JSON).UsageType}},
        @{n = 'AdditionalInfo_DatabaseAccount'       ; e = {($_.AdditionalInfo | ConvertFrom-JSON).DatabaseAccount}},
        @{n = 'AdditionalInfo_CollectionRid'         ; e = {($_.AdditionalInfo | ConvertFrom-JSON).CollectionRid}},
        @{n = 'AdditionalInfo_ResourceCategory'      ; e = {($_.AdditionalInfo | ConvertFrom-JSON).ResourceCategory}},
        @{n = 'Tags_displayName'                     ; e = {($_.Tags | ConvertFrom-JSON).displayName}},
        @{n = 'Tags_ACCESSED-VIA-INTERNET'           ; e = {($_.Tags | ConvertFrom-JSON).'ACCESSED-VIA-INTERNET'}},
        @{n = 'Tags_APP-NAME'                        ; e = {($_.Tags | ConvertFrom-JSON).'APP-NAME'}},
        @{n = 'Tags_APP-TYPE'                        ; e = {($_.Tags | ConvertFrom-JSON).'APP-TYPE'}},
        @{n = 'Tags_APPTYPE'                         ; e = {($_.Tags | ConvertFrom-JSON).APPTYPE}},
        @{n = 'Tags_CHARGECODE'                      ; e = {($_.Tags | ConvertFrom-JSON).CHARGECODE}},
        @{n = 'Tags_COMMENTS'                        ; e = {($_.Tags | ConvertFrom-JSON).COMMENTS}},
        @{n = 'Tags_COUNTRY'                         ; e = {($_.Tags | ConvertFrom-JSON).COUNTRY}},
        @{n = 'Tags_EY-REGION'                       ; e = {($_.Tags | ConvertFrom-JSON).'EY-REGION'}},
        @{n = 'Tags_IT-ENV'                          ; e = {($_.Tags | ConvertFrom-JSON).'IT-ENV'}},
        @{n = 'Tags_OWNER'                           ; e = {($_.Tags | ConvertFrom-JSON).OWNER}},
        @{n = 'Tags_OWNER-EMAIL'                     ; e = {($_.Tags | ConvertFrom-JSON).'OWNER-EMAIL'}},
        @{n = 'Tags_SERVICELINE'                     ; e = {($_.Tags | ConvertFrom-JSON).SERVICELINE}},
        @{n = 'Tags_SUB-TYPE'                        ; e = {($_.Tags | ConvertFrom-JSON).'SUB-TYPE'}},
        @{n = 'Tags_TECHCONTACTS'                    ; e = {($_.Tags | ConvertFrom-JSON).TECHCONTACTS}} | 
    Export-Csv 'C:\output.csv' -NoTypeInformation

说起来也许也更快一些:

Get-Content -Path 'input.csv' | 
    Select-Object -Skip 2 | 
    ConvertFrom-Csv | 
    ForEach-Object {
        $AdditionalInfo = $_.AdditionalInfo | ConvertFrom-Json;
        $Tags = $_.Tags | ConvertFrom-Json;
        $_ | Select-Object -Property *,
            @{n = 'AdditionalInfo_StreamingEndpointName' ; e = {$AdditionalInfo.StreamingEndpointName}},
            @{n = 'AdditionalInfo_Id'                    ; e = {$AdditionalInfo.Id}},
            ...
    } | Export-Csv ...

这样,您只需每行转换一次JSON。

但是,我怀疑要获得不错的性能,您必须使用.Net方法编写某些内容。 我建议使用Microsoft.VisualBasic.FileIO.TextFieldParser逐行为您解析CSV,并可能使用JSON.Net通过JsonConvert.DeserializeObject()反序列化JSON。 即使那样,这也不会很快。 1.5 GB的必须是几百万行。 您最好将整个CSV导入SQL Server 2016+,并在其中使用带有内置JSON解析的查询。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM