[英]Powershell - Optimize CSV processing with nested JSON
我既不是开发人员,也不是CSV专家。 我能够将这段代码放在一起,并且可以完成工作。 为了让您快速浏览,我需要处理一些嵌套在CSV中的JSON数据。 因此,我正在阅读JSON,并将其拆分为额外的列,然后保存了CSV。
现在,我的问题是,尽管此方法工作正常,但我现在需要处理一个1.5Gb CSV文件,并且我不希望处理花费2天的时间...
因此,如果你们可以帮助我调整脚本,使其在合理的时间内运行,我将不胜感激:)
$file = Get-Content -Path 'input.csv' | Select-Object -Skip 2 | ConvertFrom-Csv
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_StreamingEndpointName -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_Id -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_AppServicePlanUri -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_ImageType -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_ServiceType -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_VMName -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_UsageType -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_DatabaseAccount -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_CollectionRid -value $null
$file | Add-Member -MemberType NoteProperty -Name AdditionalInfo_ResourceCategory -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_displayName -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_ACCESSED-VIA-INTERNET' -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_APP-NAME' -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_APP-TYPE' -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_APPTYPE -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_CHARGECODE -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_COMMENTS -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_COUNTRY -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_EY-REGION' -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_IT-ENV' -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_OWNER -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_OWNER-EMAIL' -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_SERVICELINE -value $null
$file | Add-Member -MemberType NoteProperty -Name 'Tags_SUB-TYPE' -value $null
$file | Add-Member -MemberType NoteProperty -Name Tags_TECHCONTACTS -value $null
$count=1
ForEach ($line in $file) {
Write-Output "Processing line: $count"
$count++
try{
if ($line.AdditionalInfo -ne $null -Or $line.Tags -ne $null){
$line.AdditionalInfo_StreamingEndpointName = ($line.AdditionalInfo | ConvertFrom-JSON).StreamingEndpointName
$line.AdditionalInfo_Id = ($line.AdditionalInfo | ConvertFrom-JSON).Id
$line.AdditionalInfo_AppServicePlanUri = ($line.AdditionalInfo | ConvertFrom-JSON).AppServicePlanUri
$line.AdditionalInfo_ImageType = ($line.AdditionalInfo | ConvertFrom-JSON).ImageType
$line.AdditionalInfo_ServiceType = ($line.AdditionalInfo | ConvertFrom-JSON).ServiceType
$line.AdditionalInfo_VMName = ($line.AdditionalInfo | ConvertFrom-JSON).VMName
$line.AdditionalInfo_UsageType = ($line.AdditionalInfo | ConvertFrom-JSON).UsageType
$line.AdditionalInfo_DatabaseAccount = ($line.AdditionalInfo | ConvertFrom-JSON).DatabaseAccount
$line.AdditionalInfo_CollectionRid = ($line.AdditionalInfo | ConvertFrom-JSON).CollectionRid
$line.AdditionalInfo_ResourceCategory = ($line.AdditionalInfo | ConvertFrom-JSON).ResourceCategory
$line.Tags_displayName = ($line.Tags | ConvertFrom-JSON).displayName
$line.'Tags_ACCESSED-VIA-INTERNET' = ($line.Tags | ConvertFrom-JSON).'ACCESSED-VIA-INTERNET'
$line.'Tags_APP-NAME' = ($line.Tags | ConvertFrom-JSON).'APP-NAME'
$line.'Tags_APP-TYPE' = ($line.Tags | ConvertFrom-JSON).'APP-TYPE'
$line.Tags_APPTYPE = ($line.Tags | ConvertFrom-JSON).APPTYPE
$line.Tags_CHARGECODE = ($line.Tags | ConvertFrom-JSON).CHARGECODE
$line.Tags_COMMENTS = ($line.Tags | ConvertFrom-JSON).COMMENTS
$line.Tags_COUNTRY = ($line.Tags | ConvertFrom-JSON).COUNTRY
$line.'Tags_EY-REGION' = ($line.Tags | ConvertFrom-JSON).'EY-REGION'
$line.'Tags_IT-ENV' = ($line.Tags | ConvertFrom-JSON).'IT-ENV'
$line.Tags_OWNER = ($line.Tags | ConvertFrom-JSON).OWNER
$line.'Tags_OWNER-EMAIL' = ($line.Tags | ConvertFrom-JSON).'OWNER-EMAIL'
$line.Tags_SERVICELINE = ($line.Tags | ConvertFrom-JSON).SERVICELINE
$line.'Tags_SUB-TYPE' = ($line.Tags | ConvertFrom-JSON).'SUB-TYPE'
$line.Tags_TECHCONTACTS = ($line.Tags | ConvertFrom-JSON).TECHCONTACTS
}
}
catch {}
}
#write-output $info
$file | Export-Csv 'C:\output.csv' -NoTypeInformation
Add-Member
性能很差。 一遍又一遍地将所有内容保存到变量中也是如此。 您可能更幸运的是将所有内容保存在单个管道中,并使用具有计算属性的Select-Object
:
Get-Content -Path 'input.csv' |
Select-Object -Skip 2 |
ConvertFrom-Csv |
Select-Object -ErrorAction SilentlyContinue -Property *,
@{n = 'AdditionalInfo_StreamingEndpointName' ; e = {($_.AdditionalInfo | ConvertFrom-JSON).StreamingEndpointName}},
@{n = 'AdditionalInfo_Id' ; e = {($_.AdditionalInfo | ConvertFrom-JSON).Id}},
@{n = 'AdditionalInfo_AppServicePlanUri' ; e = {($_.AdditionalInfo | ConvertFrom-JSON).AppServicePlanUri}},
@{n = 'AdditionalInfo_ImageType' ; e = {($_.AdditionalInfo | ConvertFrom-JSON).ImageType}},
@{n = 'AdditionalInfo_ServiceType' ; e = {($_.AdditionalInfo | ConvertFrom-JSON).ServiceType}},
@{n = 'AdditionalInfo_VMName' ; e = {($_.AdditionalInfo | ConvertFrom-JSON).VMName}},
@{n = 'AdditionalInfo_UsageType' ; e = {($_.AdditionalInfo | ConvertFrom-JSON).UsageType}},
@{n = 'AdditionalInfo_DatabaseAccount' ; e = {($_.AdditionalInfo | ConvertFrom-JSON).DatabaseAccount}},
@{n = 'AdditionalInfo_CollectionRid' ; e = {($_.AdditionalInfo | ConvertFrom-JSON).CollectionRid}},
@{n = 'AdditionalInfo_ResourceCategory' ; e = {($_.AdditionalInfo | ConvertFrom-JSON).ResourceCategory}},
@{n = 'Tags_displayName' ; e = {($_.Tags | ConvertFrom-JSON).displayName}},
@{n = 'Tags_ACCESSED-VIA-INTERNET' ; e = {($_.Tags | ConvertFrom-JSON).'ACCESSED-VIA-INTERNET'}},
@{n = 'Tags_APP-NAME' ; e = {($_.Tags | ConvertFrom-JSON).'APP-NAME'}},
@{n = 'Tags_APP-TYPE' ; e = {($_.Tags | ConvertFrom-JSON).'APP-TYPE'}},
@{n = 'Tags_APPTYPE' ; e = {($_.Tags | ConvertFrom-JSON).APPTYPE}},
@{n = 'Tags_CHARGECODE' ; e = {($_.Tags | ConvertFrom-JSON).CHARGECODE}},
@{n = 'Tags_COMMENTS' ; e = {($_.Tags | ConvertFrom-JSON).COMMENTS}},
@{n = 'Tags_COUNTRY' ; e = {($_.Tags | ConvertFrom-JSON).COUNTRY}},
@{n = 'Tags_EY-REGION' ; e = {($_.Tags | ConvertFrom-JSON).'EY-REGION'}},
@{n = 'Tags_IT-ENV' ; e = {($_.Tags | ConvertFrom-JSON).'IT-ENV'}},
@{n = 'Tags_OWNER' ; e = {($_.Tags | ConvertFrom-JSON).OWNER}},
@{n = 'Tags_OWNER-EMAIL' ; e = {($_.Tags | ConvertFrom-JSON).'OWNER-EMAIL'}},
@{n = 'Tags_SERVICELINE' ; e = {($_.Tags | ConvertFrom-JSON).SERVICELINE}},
@{n = 'Tags_SUB-TYPE' ; e = {($_.Tags | ConvertFrom-JSON).'SUB-TYPE'}},
@{n = 'Tags_TECHCONTACTS' ; e = {($_.Tags | ConvertFrom-JSON).TECHCONTACTS}} |
Export-Csv 'C:\output.csv' -NoTypeInformation
说起来也许也更快一些:
Get-Content -Path 'input.csv' |
Select-Object -Skip 2 |
ConvertFrom-Csv |
ForEach-Object {
$AdditionalInfo = $_.AdditionalInfo | ConvertFrom-Json;
$Tags = $_.Tags | ConvertFrom-Json;
$_ | Select-Object -Property *,
@{n = 'AdditionalInfo_StreamingEndpointName' ; e = {$AdditionalInfo.StreamingEndpointName}},
@{n = 'AdditionalInfo_Id' ; e = {$AdditionalInfo.Id}},
...
} | Export-Csv ...
这样,您只需每行转换一次JSON。
但是,我怀疑要获得不错的性能,您必须使用.Net方法编写某些内容。 我建议使用Microsoft.VisualBasic.FileIO.TextFieldParser
逐行为您解析CSV,并可能使用JSON.Net通过JsonConvert.DeserializeObject()
反序列化JSON。 即使那样,这也不会很快。 1.5 GB的必须是几百万行。 您最好将整个CSV导入SQL Server 2016+,并在其中使用带有内置JSON解析的查询。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.