简体   繁体   English

Powershell 脚本效率建议

[英]Powershell script efficiency advice

it's my first time posting here but I've been a long time lurker everytime I need help with some code.这是我第一次在这里发帖,但每次我需要一些代码帮助时,我都潜伏了很长时间。

I'm fairly new to Powershell, and as most people I've been learning by myself trying to code everytime I can, so most of what I do is ugly but it works, I'd like to ask for advice on a script I wrote recently that at the moment of this post has 15 hrs running and is almost at 50% of what it has to do, obviously there's something wrong but I'm not knowledgeable enough to point what it is and any help will be greatly appreciated.我对 Powershell 还是很陌生,而且正如我一直在自己学习的大多数人一样,我每次都尝试编写代码,所以我所做的大部分事情都很丑陋,但它有效,我想就我的脚本寻求建议最近写道,目前这篇文章已经运行了 15 小时,几乎完成了它必须做的 50%,显然有问题,但我知识不足,无法指出它是什么,任何帮助将不胜感激。

So, I have a telephony .csv with compiled data from january 2020 and some days of february, each row has the date and time spent on each status, since someone uses different status over the day the file has one row for each status, my script is supposed to go through the file, find the minimum date and then start saving on new files all the data for the same day, so I'll end with one file for 01-01-2020, 02-01-2020 and so on, but it has 15 hrs running and it's still at 1/22.因此,我有一个电话 .csv 文件,其中包含 2020 年 1 月和 2 月某些天的编译数据,每一行都有在每个状态上花费的日期和时间,因为有人在一天内使用不同的状态,文件为每个状态有一行,我的脚本应该遍历文件,找到最小日期,然后开始在新文件中保存同一天的所有数据,所以我将以 01-01-2020、02-01-2020 等的一个文件结束打开,但它运行了 15 小时,它仍然是 1/22。

The column I'm using for the dates is called "DateFull" and this is the script我用于日期的列称为“DateFull”,这是脚本

write-host "opening file" 
$AT= import-csv “C:\Users\xxxxxx\Desktop\SignOnOff_20200101_20200204.csv” 
write-host "parsing and sorting file" 
$go= $AT| ForEach-Object {
        $_.DateFull= (Get-Date $_.DateFull).ToString("M/d/yyyy")
        $_
        }

Write-Host "prep day"
$min = $AT | Measure-Object -Property Datefull  -Minimum  

Write-Host $min
$dateString =  [datetime] $min.Minimum
Write-host $datestring

write-host "Setup dates"
$start = $DateString - $today
$start = $start.Days

For ($i=$start; $i -lt 0; $i++)  {
$date = get-date
$loaddate = $date.AddDays($i) 
$DateStr = $loadDate.ToString("M/d/yyyy")
$now = Get-Date -Format HH:mm:ss
write-host $datestr " " $now

#Install-Module ImportExcel #optional import if you dont have the module already
$Check = $at | where {$_.'DateFull' -eq $datestr} 
write-host $check.count
if ($check.count -eq 0 ){}
else {$AT | where {$_.'DateFull' -eq $datestr} | Export-Csv "C:\Users\xxxxx\Desktop\signonoff\SignOnOff_$(get-date (get-date).addDays($i) -f yyyyMMdd).csv" -NoTypeInformation}
}

$at = '' 

Thank you so much for your help非常感谢你的帮助

The first loop doesn't make much sense.第一个循环没有多大意义。 It loops through CSV contents and converts each row's date into different a format.它遍历 CSV 内容并将每一行的日期转换为不同的格式。 Afterwards, $go is never used.之后, $go不再使用。

$go= $AT| ForEach-Object {
        $_.DateFull= (Get-Date $_.DateFull).ToString("M/d/yyyy")
        $_
        }

Later, there is an attempt to calculate a value from uninitialized a variable.后来,尝试从未初始化的变量计算值。 $today is never defined. $today从未定义过。

$start = $DateString - $today

It looks, however, like you'd like to calculate, in days, how old eldest record is.但是,看起来您想以天为单位计算最老记录的年龄。

Then there's a loop that counts from negative days to zero.然后有一个循环,从负天数到零。 During each iteration, the whole CSV is searched:在每次迭代期间,搜索整个 CSV 文件:

$Check = $at | where {$_.'DateFull' -eq $datestr} 

If there are 30 days and 15 000 rows, there are 30*15000 = 450 000 iterations.如果有 30 天和 15 000 行,则有 30*15000 = 450 000 次迭代。 This has complexity of O(n^2), which means runtime will go sky high for even relative small number of days and rows.这具有 O(n^2) 的复杂性,这意味着即使在相对较少的天数和行数内,运行时也会飞得很高。

The next part is that the same array is processed again:下一部分是再次处理同一个数组:

else {$AT | where {$_.'DateFull' -eq $datestr

Well, the search condition is exactly the same, but now results are sent to a file.嗯,搜索条件完全一样,但现在结果被发送到一个文件。 This has a side effect of doubling your work.这有一个副作用,使您的工作加倍。 Still, O(2n^2) => O(n^2), so at least the runtime isn't growing in cubic or worse.尽管如此,O(2n^2) => O(n^2),所以至少运行时间不会以三次或更糟的方式增长。

As for how to fix this, there are a few things.至于如何解决这个问题,有几件事。 If you sort the CSV based on date, it can be processed afterwards in just a single run.如果您根据日期对 CSV 进行排序,则只需运行一次即可对其进行处理。

$at = $at | sort -Property datefull

Then, iterate each row.然后,迭代每一行。 Since the rows are in ascending order, the first is the oldest.由于行按升序排列,第一个是最旧的。 For each row, check if date has changed.对于每一行,检查日期是否已更改。 If not, add it to buffer.如果没有,请将其添加到缓冲区。 If it has, save the old buffer and create a new one.如果有,保存旧缓冲区并创建一个新缓冲区。

The sample doesn't convert file names in yyyyMMdd format, and it assumes there are only two columns foo and datefull like so,该示例不会以 yyyyMMdd 格式转换文件名,并且它假设只有两列foodatefull像这样,

$sb = new-object text.stringbuilder
# What's the first date?
$current = $at[0]

# Loop through sorted data
for($i = 0; $i -lt $at.Count; ++$i) {

    # Are we on next date?
    if ($at[$i].DateFull -gt $current.datefull) {
        # Save the buffer
        $file = $("c:\temp\OnOff_{0}.csv" -f ($current.datefull -replace '/', '.') )
        set-content $file $sb.tostring()
        # Pick the current date
        $current = $at[$i]

        # Create new buffer and save data there
        $sb = new-object text.stringbuilder
        [void]$sb.AppendLine(("{0},{1}" -f $at[$i].foo, $at[$i].datefull))    
    } else {
        [void]$sb.AppendLine(("{0},{1}" -f $at[$i].foo, $at[$i].datefull))    
    }
}
# Save the final buffer
$file = $("c:\temp\OnOff_{0}.csv" -f ($current.datefull -replace '/', '.') )
set-content $file $sb.tostring()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM