简体   繁体   English

使用 powershell 将 JS 格式的 csv 转换为带分隔符的 CSV

[英]Convert JS format csv to a delimited CSV using powershell

I have a sample csv file and tried to convert to a delimited format csv using powershell.我有一个示例 csv 文件,并尝试使用 powershell 转换为分隔格式的 csv。 For the timestamp part, it was stored as seconds by default, wondering if it can be converted into "hh:mm"对于时间戳部分,默认存储为秒,想知道能不能转成"hh:mm"

Not too sure where i should start with.不太确定我应该从哪里开始。

Thanks for help!感谢帮助!

sample.csv样本.csv

{
   "Body" : {
      "inverter/1" : {
         "Data" : {
            "Current_DC_String_1" : {
               "Unit" : "A",
               "Values" : {
                  "0" : 0,
                  "300" : 0,
                  "600" : 0,
                  "900" : 0
               },
               "_comment" : "channelId=66050"
            },
            "Current_DC_String_2" : {
               "Unit" : "A",
               "Values" : {
                  "0" : 0,
                  "300" : 0,
                  "600" : 0,
                  "900" : 0
               },
               "_comment" : "channelId=131586"
            },
            "EnergyReal_WAC_Sum_Produced" : {
               "Unit" : "Wh",
               "Values" : {
                  "0" : 0,
                  "300" : 0,
                  "600" : 0,
                  "900" : 0
               },
               "_comment" : "channelId=67830024"
            },
            "Voltage_DC_String_1" : {
               "Unit" : "V",
               "Values" : {
                  "0" : 7.3000000000000007,
                  "300" : 7.3000000000000007,
                  "600" : 7.9000000000000004,
                  "900" : 7.7000000000000002
               },
               "_comment" : "channelId=66049"
            },
            "Voltage_DC_String_2" : {
               "Unit" : "V",
               "Values" : {
                  "0" : 4.2000000000000002,
                  "300" : 4.2000000000000002,
                  "600" : 4.5,
                  "900" : 4.4000000000000004
               },
               "_comment" : "channelId=131585"
            }
         },
         "DeviceType" : 233,
         "End" : "2020-03-11T23:59:59+11:00",
         "NodeType" : 97,
         "Start" : "2020-03-11T00:00:00+11:00"
      },
      "inverter/2" : {
         "Data" : {
            "Current_DC_String_1" : {
               "Unit" : "A",
               "Values" : {
                  "0" : 0,
                  "300" : 0,
                  "600" : 0,
                  "900" : 0
               },
               "_comment" : "channelId=66050"
            },
            "Current_DC_String_2" : {
               "Unit" : "A",
               "Values" : {
                  "0" : 0,
                  "300" : 0,
                  "600" : 0,
                  "900" : 0
               },
               "_comment" : "channelId=131586"
            },
            "EnergyReal_WAC_Sum_Produced" : {
               "Unit" : "Wh",
               "Values" : {
                  "0" : 0,
                  "300" : 0,
                  "600" : 0,
                  "900" : 0
               },
               "_comment" : "channelId=67830024"
            },
            "Voltage_DC_String_1" : {
               "Unit" : "V",
               "Values" : {
                  "0" : 6.7000000000000002,
                  "300" : 7,
                  "600" : 6.8000000000000007,
                  "900" : 7.2000000000000002
               },
               "_comment" : "channelId=66049"
            },
            "Voltage_DC_String_2" : {
               "Unit" : "V",
               "Values" : {
                  "0" : 2.2000000000000002,
                  "300" : 2.3000000000000003,
                  "600" : 2.2000000000000002,
                  "900" : 2.2000000000000002
               },
               "_comment" : "channelId=131585"
            }
         },
         "DeviceType" : 233,
         "End" : "2020-03-11T23:59:59+11:00",
         "NodeType" : 98,
         "Start" : "2020-03-11T00:00:00+11:00"
      }
   },
   "Head" : {
      "RequestArguments" : {
         "Query" : "Inverter+SensorCard+Meter",
         "Scope" : "System"
      },
      "Status" : {
         "Code" : 0,
         "Reason" : "",
         "UserMessage" : ""
      },
      "Timestamp" : "2020-03-11T01:00:03+11:00"
   }
}

Expected result with default timestamp converted转换默认时间戳的预期结果

在此处输入图片说明

Or if possible can add the date"2020-03-11" parsed from "Start" : "2020-03-11T00:00:00+11:00" in front of converted time to make DateTimestamp for each row.或者,如果可能,可以在转换时间前添加从“开始”解析的日期“2020-03-11”:“2020-03-11T00:00:00+11:00”,为每一行制作 DateTimestamp。

在此处输入图片说明

Assuming your data is actually a JSON file, and not a CSV file, you could try the approach below.假设您的数据实际上是一个 JSON 文件,而不是一个 CSV 文件,您可以尝试以下方法。 If basicaly converts the JSON file to System.Collections.Hashtable object with ConvertFrom-Json using the -AsHashTable switch.如果基本上使用-AsHashTable开关使用ConvertFrom-Json将 JSON 文件转换为System.Collections.Hashtable对象。 You can read how to iterate through hashtable properties from Looping through a hash, or using an array in PowerShell .您可以从循环遍历哈希或在 PowerShell 中使用数组阅读如何遍历哈希表属性。

You can then get the System.Management.Automation.PSCustomObject rows and pipe to Export-Csv , which creates the CSV file.然后,您可以获取System.Management.Automation.PSCustomObject行和管道到Export-Csv ,这将创建 CSV 文件。 Additionally, you can get the timespan using [System.Timespan]::FromSeconds() , which converts total seconds to a object of type System.Timespan and format hh:mm with System.TimeSpan.ToString() .此外,您可以使用[System.Timespan]::FromSeconds()获取时间跨度,它将总秒数转换为System.Timespan类型的对象并使用System.TimeSpan.ToString()格式为hh:mm For more information on timespan formatting, you can have a look at Convert seconds to hh:mm:ss,fff format in PowerShell .有关时间跨度格式的更多信息,您可以查看在 PowerShell 中将秒转换为 hh:mm:ss,fff 格式

As a extra cleanup step, I also went and removed the " quotes with Set-Content as well. This isn't necessary if you would like the " to persist in your file.作为额外的清理步骤,我还删除了带有Set-Content"引号。如果您希望"保留在您的文件中,则没有必要这样做。

$json = Get-Content -Path .\sample.json | ConvertFrom-Json -AsHashtable

$json.Body.GetEnumerator() | ForEach-Object {
    $inverter = $_.Key

    $_.Value.Data.GetEnumerator() | ForEach-Object {
        $value = $_.Key

        $_.Value.Values.GetEnumerator() | ForEach-Object {
            [PSCustomObject]@{
                Inverter = "$inverter $value"
                Second = [timespan]::FromSeconds($_.Key).ToString("hh\:mm")
                Value = $_.Value
            }
        }
    }
} | Export-Csv -Path .\sample.csv
# Use NoTypeINformation to remove #TYPE from headers in < Powershell 6

Set-Content -Path .\sample.csv -Value ((Get-Content -Path .\sample.csv) -replace '"')

sample.csv样本.csv

Inverter,Second,Value
inverter/1 Current_DC_String_2,00:05,0
inverter/1 Current_DC_String_2,00:10,0
inverter/1 Current_DC_String_2,00:15,0
inverter/1 Current_DC_String_2,00:00,0
inverter/1 Current_DC_String_1,00:05,0
inverter/1 Current_DC_String_1,00:10,0
...

Performance Improvement性能改进

As mklement0 explained, when using .NET types or [PSCustomObject] , member enumeration is much faster than using pipelines.正如mklement0解释的那样,当使用 .NET 类型或[PSCustomObject] ,成员枚举比使用管道快得多。 You can find out more from this helpful answer .您可以从这个有用的答案中找到更多信息。

Below is simple usage of the improvement that can be made with foreach enumeration instead of Foreach-Object .下面是可以使用foreach枚举而不是Foreach-Object进行的改进的简单用法。

$json = Get-Content -Path .\sample.json | ConvertFrom-Json -AsHashtable

$csvRows = @()

foreach ($inverter in $json.Body.GetEnumerator()) {

    foreach ($outerValue in $inverter.Value.Data.GetEnumerator()) {

        foreach ($innerValue in $outerValue.Value.Values.GetEnumerator()){
            $csvRowData = [PSCustomObject]@{
                Inverter = "$($inverter.Key) $($outerValue.Key)"
                Second = [timespan]::FromSeconds($innerValue.Key).ToString("hh\:mm")
                Value = $innerValue.Value
            }

            $csvRows += $csvRowData;
        }
    }
}

$csvRows | Export-Csv -Path .\sample.csv

Set-Content -Path .\sample.csv -Value ((Get-Content -Path .\sample.csv) -replace '"')

Your input file is a JSON file, not a CSV file.您的输入文件是 JSON 文件,而不是CSV文件。

In order to flatten its object graph into the CSV row-column structure you need, nested loops are required:为了将其对象图展平为您需要的 CSV 行列结构,需要嵌套循环:

# Parse the JSON file into custom objects.
$fromJson = Get-Content -Raw file.json | ConvertFrom-Json

& {
  foreach ($inverter in $fromJson.Body.psobject.Properties.Name) {
    $date =  $fromJson.Body.$inverter.Start
    if ($date -is [datetime]) { $date = $date.ToString('yyyy-MM-dd') }
    else                      { $date = ($date -csplit 'T')[0] }
    foreach ($measurement in $fromJson.Body.$inverter.Data.psobject.Properties.Name) {
      foreach ($valueProp in $fromJson.Body.$inverter.Data.$measurement.Values.psobject.Properties) {
        [pscustomobject] @{
          Inverter  = "$inverter $measurement"
          TimeStamp = $date + ' ' + 
                      [timespan]::FromSeconds([int] $valueProp.Name).ToString('hh\:mm')
          Value     = $valueProp.Value
        }
      }
    }
  } 
} | ConvertTo-Csv  # output CSV data as an array of *strings*; 
                   # to save to a *file*, use something like:
                   # Export-Csv -NoTypeInformation out.csv

Note how .psobject.Properties is used to reflect on a given object's properties;请注意.psobject.Properties如何用于反映给定对象的属性; .psobject is a normally hidden property available on any object, and it provides reflection information more conveniently and faster than the Get-Member cmdlet does. .psobject是任何对象上可用的通常隐藏的属性,它比Get-Member cmdlet 更方便、更快速地提供反射信息。

Also note how the timestamps in your JSON are parsed by ConvertFrom-Json depends on the PowerShell edition (version):另请注意ConvertFrom-Json如何解析 JSON 中的时间戳取决于 PowerShell 版本(版本):

  • Windows PowerShell parses them as strings , so it's sufficient to split the string by T and take what comes before it. Windows PowerShell将它们解析为strings ,因此将字符串按T拆分并采用它之前的内容就足够了。

  • PowerShell [Core] parses them as [datetime] instances, expressed in local time, so they're only guaranteed to result in the same calendar day if the local time zone is the same as the one implied by the UTC offset in the JSON values ( +11:00 ). PowerShell [Core]将它们解析为[datetime]实例,以本地时间表示,因此只有当本地时区与 JSON 值中的 UTC 偏移量所暗示的时区相同时,它们才能保证产生相同的日历日( +11:00 )。


Optional reading: performance considerations :可选阅读:性能考虑

Note the use of nested foreach loops over the use of the ForEach-Object cmdlet in the pipeline for better performance.请注意在管道中使用嵌套的foreach循环而不是使用ForEach-Object cmdlet以获得更好的性能。 See this answer for background information.有关背景信息,请参阅此答案

With small input files that may not matter, however, and RoadRunner's helpful hashtable -based alternative , which uses nested pipeline, may well be fast enough in practice - and it too could be made to use foreach loops instead ( update : it now does, in a second command).然而,对于可能无关紧要的小输入文件,以及RoadRunner 有用的基于哈希表的替代方案,它使用嵌套管道,在实践中可能足够快——而且它也可以改为使用foreach循环更新:现在确实如此,在第二个命令中)。

Parsing the JSON into hashtables ( [hashtable] , aka System.Collections.Hashtable ) via将 JSON 解析为哈希表( [hashtable] ,又名System.Collections.Hashtable )通过
-AsHashtable : -AsHashtable

  • has the advantage of requiring less memory (the internal storage of the [pscustomobject] instances that ConvertFrom-Json outputs by default is somewhat inefficient).具有需要较少内存的优点ConvertFrom-Json默认输出的[pscustomobject]实例的内部存储有些低效)。

  • has the potential disadvantage of not preserving the input order of properties, given that [hashtable] entries are inherently unordered;具有不保留属性输入顺序的潜在缺点,因为[hashtable]条目本质上是无序的; in the case at hand, this is not a concern, however, given that different output objects with a fixed property order are created.然而,在手头的情况下,这不是问题,因为创建了具有固定属性顺序的不同输出对象。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM