简体   繁体   English

使用Powershell解析多行文本并导出到csv

[英]Parse multiple lines of text with powershell and export to csv

I have multiple large log files that I'd like to export to CSV. 我有多个要导出为CSV的大型日志文件。 To start with, I just want to split two parts, Date and Event. 首先,我只想拆分两个部分,即日期和事件。 The problem I'm having is that not every line starts with a date. 我遇到的问题是,并非每行都以日期开头。

Here is a sample chunk of log. 这是日志示例块。 Date/times are always 23 characters. 日期/时间始终为23个字符。 The rest varies with the log and event description. 其余的随日志和事件描述而变化。

在此处输入图片说明

I'd like the end result to look like this in excel. 我希望最终结果在excel中看起来像这样。

在此处输入图片说明

Here's what I've tried so far but just returns the first 23 characters of each line. 到目前为止,这是我尝试过的操作,但是只返回每行的前23个字符。

$content = Get-Content myfile.log -TotalCount 50 
for($i = 0; $i -lt $content.Length; $i++) {
$a = $content[$i].ToCharArray()
$b = ([string]$a[0..23]).replace(" ","")
Write-Host $b }

Read the file in raw as a multi-line string, then use RegEx to split on the date pattern, and for each chunk make a custom object with the two properties that you want, where the first value is the first 23 characters, and the second value is the rest of the string trimmed. 以多行字符串的形式读取原始文件,然后使用RegEx分割日期模式,并为每个块创建一个具有两个所需属性的自定义对象,其中第一个值是前23个字符,第二个值是修剪后的字符串的其余部分。

(Get-Content C:\Path\To\File.csv -Raw) -split '(?m)(?=^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})'|
    Where{$_}|
    ForEach-Object{
        [PSCustomObject]@{
            'Col1'=$_.Substring(0,23)
            'Col2'=$_.Substring(23).Trim()
        }
    }

Then you can pipe that to a CSV, or do whatever you want with the data. 然后,您可以将其通过管道传输到CSV,或者对数据进行任何处理。 If the files are truly massive this may not be viable, but it should work ok on files up to a few hundred megs I would think. 如果文件确实很大,这可能不可行,但我认为对几百兆的文件应该可以。 Using your sample text that output: 使用输出的示例文本:

Col1                    Col2
----                    ----
2017-09-04 12:31:11.343 General BOECD:: ProcessStartTime: ...
2017-09-04 12:31:11.479 General MelsecIoWrapper: Scan ended: device: 1, ScanStart: 9/4/2017 12:31:10 PM Display: False
2017-09-04 12:31:11.705 General BOECD:: ProcessEndTime: ...
2017-09-04 12:31:13.082 General BOECD:: DV Data:

The ... at the end of the two lines are where it truncated the multi-line value in order to display it on screen, but the value is there intact. 两行末尾的...是截断多行值以便在屏幕上显示的地方,但是该值完整无缺。

(?=...) is a so-called "positive lookahead assertion". (?=...)是所谓的“正向超前断言”。 Such assertions cause a regular expression to match the given pattern without actually including it in the returned match/string. 这种断言会导致正则表达式匹配给定的模式,而实际上并未将其包含在返回的match / string中。 In this case the match returns the empty string before a timestamp, so the string can be split there without removing the timestamp. 在这种情况下,匹配会在时间戳记之前返回空字符串,因此可以在不删除时间戳记的情况下在其中拆分字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM