简体   繁体   中英

Parse multiple lines of text with powershell and export to csv

I have multiple large log files that I'd like to export to CSV. To start with, I just want to split two parts, Date and Event. The problem I'm having is that not every line starts with a date.

Here is a sample chunk of log. Date/times are always 23 characters. The rest varies with the log and event description.

在此处输入图片说明

I'd like the end result to look like this in excel.

在此处输入图片说明

Here's what I've tried so far but just returns the first 23 characters of each line.

$content = Get-Content myfile.log -TotalCount 50 
for($i = 0; $i -lt $content.Length; $i++) {
$a = $content[$i].ToCharArray()
$b = ([string]$a[0..23]).replace(" ","")
Write-Host $b }

Read the file in raw as a multi-line string, then use RegEx to split on the date pattern, and for each chunk make a custom object with the two properties that you want, where the first value is the first 23 characters, and the second value is the rest of the string trimmed.

(Get-Content C:\Path\To\File.csv -Raw) -split '(?m)(?=^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})'|
    Where{$_}|
    ForEach-Object{
        [PSCustomObject]@{
            'Col1'=$_.Substring(0,23)
            'Col2'=$_.Substring(23).Trim()
        }
    }

Then you can pipe that to a CSV, or do whatever you want with the data. If the files are truly massive this may not be viable, but it should work ok on files up to a few hundred megs I would think. Using your sample text that output:

Col1                    Col2
----                    ----
2017-09-04 12:31:11.343 General BOECD:: ProcessStartTime: ...
2017-09-04 12:31:11.479 General MelsecIoWrapper: Scan ended: device: 1, ScanStart: 9/4/2017 12:31:10 PM Display: False
2017-09-04 12:31:11.705 General BOECD:: ProcessEndTime: ...
2017-09-04 12:31:13.082 General BOECD:: DV Data:

The ... at the end of the two lines are where it truncated the multi-line value in order to display it on screen, but the value is there intact.

(?=...) is a so-called "positive lookahead assertion". Such assertions cause a regular expression to match the given pattern without actually including it in the returned match/string. In this case the match returns the empty string before a timestamp, so the string can be split there without removing the timestamp.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM