简体   繁体   English

Powershell - 将日期时间字符串解析为日期时间 object

[英]Powershell - Parse datetime string to datetime object

I am trying to parse a file and convert a string date to a object datetime in en-US culture, but I get the error ArgumentOutOfRangeException.我正在尝试解析文件并将字符串日期转换为 en-US 文化中的 object 日期时间,但出现错误 ArgumentOutOfRangeException。

I am stuck on two errors:我陷入了两个错误:

  • a) Need the month but looks like I can't use 'a' UFormat. a)需要月份,但看起来我不能使用“a”UFormat。 (I can discard the day on string format 'Wed','Sun','Sat' to make the script easier) (我可以丢弃字符串格式 'Wed'、'Sun'、'Sat' 的日期以使脚本更容易)
  • b) Need to replace "now" to " " b) 需要将“现在”替换为“”
BARCODE     LOCATION    LIBRARY            STORAGEPOLICY                     RETAIN UNTILL DATE       
-------     --------    -------            -------------                     ------------------       
L40065L8    IEPort1     DRP_TAPE_DRPLTO    _DRP_GLB_SECOND_COPY_TAPE_WEEK    Wed Mar 31 10:13:07 2021 
L40063L8    slot 1      DRP_TAPE_DRPLTO    _DRP_GLB_SECOND_COPY_TAPE_MONTH   Sun Mar  6 22:34:39 2022 
L40072L8    slot 5      DRP_TAPE_DRPLTO    _DRP_GLB_SECOND_COPY_TAPE_ANNUAL  now                      
L40071L8    slot 6      DRP_TAPE_DRPLTO                                      now                      
L40070L8    slot 7      DRP_TAPE_DRPLTO                                      now                      
L40064L8    slot 8      DRP_TAPE_DRPLTO    _DRP_GLB_SECOND_COPY_TAPE_MONTH   Sat Mar 19 11:10:37 2022

$lines = [System.IO.File]::ReadAllLines("c:\temp\qmedia.txt")

$lines = $lines | Select-Object -Skip 2
$objects = $lines | % {
    return [PSCustomObject]@{
        BARCODE  = $_.Substring(0,8).Trim()
        LOCATION = $_.Substring(12,8).Trim()
        LIBRARY = $_.Substring(24,15).Trim()
        STORAGEPOLICY = $_.Substring(44,33).Trim()
        RETAINUNTIL = [datetime]::ParseExact($_.Substring(78,25).Trim()), "a dd hh:mm:ss yyyy", [Globalization.CultureInfo]::CreateSpecificCulture('en-US'))
    }
}

$objects

Can anyone help me?谁能帮我?

As @Matt mentioned in the comments, the first part of your problem is the data format - you're relying on the exact column widths being correct when you're using Substring(78, 25) , which in the case of your data looks to be incorrect...正如@Matt 在评论中提到的那样,问题的第一部分是数据格式 - 当您使用Substring(78, 25)时,您依赖于正确的列宽,对于您的数据看起来不正确……

PS> $line = "L40065L8    IEPort1     DRP_TAPE_DRPLTO    _DRP_GLB_SECOND_COPY_TAPE_WEEK    Wed Mar 31 10:13:07 2021 "
PS> $line.Substring(78)
ed Mar 31 10:13:07 2021

gives ed Mar 31 10:13:07 2021 instead of what you're probably expecting which is Wed Mar 31 10:13:07 2021 .ed Mar 31 10:13:07 2021而不是你可能期待的Wed Mar 31 10:13:07 2021

If you can , it would be better to change your data format to eg csv or json so you can extract the fields more easily, but if you can't do that you could try to dynamically calculate the column widths - eg:如果可以,最好将数据格式更改为例如 csv 或 json 以便您可以更轻松地提取字段,但如果您不能这样做,您可以尝试动态计算列宽 - 例如:

$columns = [regex]::Matches($lines[1], "-+").Index;
# 0
# 12
# 24
# 43
# 77

This basically finds the start position of each of the "------" heading underlines, and then you can do something like:这基本上找到每个“------”标题下划线的开始position,然后您可以执行以下操作:

$objects = $lines | % {
    return [PSCustomObject] @{
        BARCODE  = $_.Substring($columns[0], $columns[1] - $columns[0]).Trim()
        LOCATION = $_.Substring($columns[1], $columns[2] - $columns[1]).Trim()
        LIBRARY = $_.Substring($columns[2], $columns[3] - $columns[2]).Trim()
        STORAGEPOLICY = $_.Substring($columns[3], $columns[4] - $columns[3]).Trim()
        RETAINUNTIL = [datetime]::ParseExact(
            $_.Substring($columns[4]).Trim(),
            "a dd hh:mm:ss yyyy",
            [Globalization.CultureInfo]::CreateSpecificCulture("en-US")
        )
    }
}

Except now , we're getting this error:除了now ,我们收到此错误:

Exception calling "ParseExact" with "3" argument(s): "String 'Wed Mar 31 10:13:07 2021' was not recognized as a valid DateTime."

which we can fix with:我们可以解决:

[datetime]::ParseExact(
   "Wed Mar 31 10:13:07 2021",
   "ddd MMM dd HH:mm:ss yyyy",
   [Globalization.CultureInfo]::CreateSpecificCulture("en-US")
)
# 31 March 2021 10:13:07

but you've also got this date format:你也有这种日期格式:

Sun Mar 6 22:34:39 2022

(two spaces when the day part is a single digit) (当日部分是个位数时,两个空格)

so we need to use this overload of ParseExact instead to allow both formats:所以我们需要使用ParseExact重载来允许两种格式:

[datetime]::ParseExact(
   "Sun Mar  6 22:34:39 2022",
   [string[]] @( "ddd MMM dd HH:mm:ss yyyy", "ddd MMM  d HH:mm:ss yyyy"),
   [Globalization.CultureInfo]::CreateSpecificCulture("en-US"),
   "None"  
)

and then we need to allow for the literal string now , so your final code becomes:然后我们now需要考虑文字字符串,因此您的最终代码变为:

$lines = [System.IO.File]::ReadAllLines("c:\temp\qmedia.txt")

$columns = [regex]::Matches($lines[1], "-+").Index;

$lines = $lines | Select-Object -Skip 2
$objects = $lines | % {
    return [PSCustomObject] @{
        BARCODE  = $_.Substring($columns[0], $columns[1] - $columns[0]).Trim()
        LOCATION = $_.Substring($columns[1], $columns[2] - $columns[1]).Trim()
        LIBRARY = $_.Substring($columns[2], $columns[3] - $columns[2]).Trim()
        STORAGEPOLICY = $_.Substring($columns[3], $columns[4] - $columns[3]).Trim()
        RETAINUNTIL = if( $_.Substring($columns[4]).Trim() -eq "now" ) {
            " " } else {
            [datetime]::ParseExact(
                $_.Substring($columns[4]).Trim(),
                [string[]] @( "ddd MMM dd HH:mm:ss yyyy", "ddd MMM  d HH:mm:ss yyyy"),
                [Globalization.CultureInfo]::CreateSpecificCulture("en-US"),
                "None"
            )
        }
    }
}

$objects | ft

#BARCODE  LOCATION LIBRARY         STORAGEPOLICY                    RETAINUNTIL
#-------  -------- -------         -------------                    -----------
#L40065L8 IEPort1  DRP_TAPE_DRPLTO _DRP_GLB_SECOND_COPY_TAPE_WEEK   31/03/2021 10:13:07
#L40063L8 slot 1   DRP_TAPE_DRPLTO _DRP_GLB_SECOND_COPY_TAPE_MONTH  06/03/2022 22:34:39
#L40072L8 slot 5   DRP_TAPE_DRPLTO _DRP_GLB_SECOND_COPY_TAPE_ANNUAL
#L40071L8 slot 6   DRP_TAPE_DRPLTO
#L40070L8 slot 7   DRP_TAPE_DRPLTO
#L40064L8 slot 8   DRP_TAPE_DRPLTO _DRP_GLB_SECOND_COPY_TAPE_MONTH  19/03/2022 11:10:37

Update更新

Inspired by mklement0's answer , it might be useful to have a generalised parser for your file format - this returns a set of pscustomobjects with properties that match the file headers:mklement0's answer 的启发,为您的文件格式提供通用解析器可能会很有用 - 这会返回一组 pscustomobjects,其属性与文件头匹配:

function ConvertFrom-MyFormat
{

    param
    (
        [Parameter(Mandatory=$true)]
        [string[]] $Lines
    )

    # find the positions of the underscores so we can access each one's index and length
    $matches = [regex]::Matches($Lines[1], "-+");

    # extract the header names from the first line using the 
    # positions of the underscores in the second line as a cutting guide
    $headers = $matches | foreach-object {
        $Lines[0].Substring($_.Index, $_.Length);
    }

    # process the data lines and return a custom objects for each one.
    # (the property names will match the headers)
    $Lines | select-object -Skip 2 | foreach-object {
        $line = $_;
        $values = [ordered] @{};
        0..($matches.Count-2) | foreach-object {
            $values.Add($headers[$_], $line.Substring($matches[$_].Index, $matches[$_+1].Index - $matches[$_].Index));
        }
        $values.Add($headers[-1], $line.Substring($matches[-1].Index));
        new-object PSCustomObject -Property $values;
    }

}

and your main code then just becomes a case of cleaning up and restructuring the result of this function:然后你的主要代码就变成了清理和重组这个 function 的结果的情况:

$lines = [System.IO.File]::ReadAllLines("c:\temp\qmedia.txt")

$objects = ConvertFrom-MyFormat -Lines $lines | foreach-object {
    return new-object PSCustomObject -Property ([ordered] @{
        BARCODE = $_.BARCODE.Trim()
        LOCATION = $_.LOCATION.Trim()
        LIBRARY = $_.LIBRARY.Trim()
        STORAGEPOLICY = $_.STORAGEPOLICY.Trim()
        RETAINUNTIL = if( $_."RETAIN UNTILL DATE".Trim() -eq "now" ) {
            " " } else {
            [datetime]::ParseExact(
                $_."RETAIN UNTILL DATE".Trim(),
                [string[]] @( "ddd MMM dd HH:mm:ss yyyy", "ddd MMM  d HH:mm:ss yyyy"),
                [Globalization.CultureInfo]::CreateSpecificCulture("en-US"),
                "None"
            )
        }
    })
}

$objects | ft;

mclayton's helpful answer provides good explanations and an effective solution. mclayton 的有用答案提供了很好的解释和有效的解决方案。

Let me complement it with an approach that:让我用一种方法来补充它:

  • generically parses fixed-column-width input files一般解析固定列宽的输入文件
  • assuming that the column widths can reliably be inferred from the separator line (the 2nd line) , such that each substring between adjacent column separators such as ------- indicates a column.假设可以从分隔线(第 2 行)可靠地推断出列宽,这样相邻列分隔符之间的每个 substring (例如-------表示一列。

Note: The code requires PowerShell (Core) v6.2.1+, but could be adapted to work in Windows PowerShell too.注意:该代码需要 PowerShell (Core) v6.2.1+,但也可以适用于 Windows PowerShell。

$sepChar = '-' # The char. used on the separator line to indicate column spans.
Get-Content c:\temp\qmedia.txt | ForEach-Object {
  $line = $_
  switch ($_.ReadCount) {
    1 { 
      # Header line: save for later analysis
      $headerLine = $line
      break
    } 
    2 { 
      # Separator line: it is the only reliable indicator of column width.
      # Construct a regex that captures the column values.
      # With the sample input's separator line, the resulting regex is:
      #     (.{12})(.{12})(.{19})(.{34})(.{25})
      # Note: Syntax requires PowerShell (Core) v6.2.1+
      $reCaptureColumns = 
        $line -replace ('{0}+[^{0}]+' -f [regex]::Escape($sepChar)), 
                       { "(.{$($_.Value.Length)})" }
      # Break the header line into column names.
      if ($headerLine -notmatch $reCaptureColumns) { Throw "Unexpected header line format: $headerLine" }
      # Save the array of column names.
      $columnNames = $Matches[1..($Matches.Count - 1)].TrimEnd()  
      break
    }
    default {
      # Data line:
      if ($line -notmatch $reCaptureColumns) { Throw "Unexpected line format: $line" }
      # Construct an ordered hashtable from the column values.
      $oht = [ordered] @{ }
      foreach ($ndx in 1..$columnNames.Count) {
        $oht[$columnNames[$ndx-1]] = $Matches[$ndx].TrimEnd()
      }
      [pscustomobject] $oht # Convert to [pscustomobject] and output.
    }
  }
}

The above outputs a stream of [pscustomobject] instances, which allow for robust, convenient further processing, such as the date parsing you require, as shown in mclayton's answer (of course, you could integrate this processing directly into the code above, but I wanted to show the fixed-width parsing solution alone).以上输出[pscustomobject]实例的 stream 实例,它允许进行健壮、方便的进一步处理,例如您需要的日期解析,如 mclayton 的回答所示(当然,您可以将此处理直接集成到上面的代码中,但我想单独展示固定宽度的解析解决方案)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM