简体   繁体   English

在PowerShell中将哈希表导出到CSV中的多列

[英]Export hash table to multiple columns in CSV in PowerShell

I have a large number of files, of which I want to do a word analysis - counting how often each word appears within each file.我有大量文件,我想对其中的文件进行单词分析 - 计算每个单词在每个文件中出现的频率。 As the final output I want to have a CSV file with the file names in the heading and for each file two columns - word and the respective count.作为最终输出,我想要一个 CSV 文件,标题中包含文件名,每个文件有两列 - 单词和相应的计数。

file1 word, file1 count, file2 word, file2 count, ....
hello, 4, world, 5, ...
password, 10, save, 2, ...

To achieve this I open each file and save the word count in a hash table.为此,我打开每个文件并将字数保存在哈希表中。 Because each hash table has a different length (different number of unique words) I try to put the results in a data table to export them.因为每个哈希表都有不同的长度(不同数量的唯一词),所以我尝试将结果放在数据表中以导出它们。

$file = Get-ChildItem -Recurse 

$out = New-Object System.Data.DataSet "ResultsSet"

foreach($f in $file){
$pres = $ppt.Presentations.Open($f.FullName, $true, $true, $false)
$id = $f.Name.substring(0,5)

$results = @{} #Hash table for this file
for($i = 4; $i -le $pres.Slides.Count; $i++){
    $s = $pres.Slides($i)
    $shapes = $s.Shapes 
    $textBox = $shapes | ?{$_.TextFrame.TextRange.Length -gt 100}

    if($textBox -ne $null){
        $textBox.TextFrame.TextRange.Words() | %{$_.Text.Trim()} | %{if(-not $results.ContainsKey("$_")){$results.Add($_,1)}else{$results["$_"] += 1 }}
    }
}

$pres.Close()

$dt = New-Object System.Data.DataTable
$dt.TableName = $id
[String]$dt.Columns.Add("$id Word")
[Int]$dt.Columns.Add("$id Count")
foreach($r in ($results.GetEnumerator() | sort Value)) {
    $dt.Rows.Add($r.Key, $r.Value)
}
$out.Tables.Add($dt)
}

$out | export-csv

There are two main issues:主要有两个问题:

  1. The number of unique words is different for each file (hash tables have different length)每个文件的唯一字数不同(哈希表长度不同)
  2. Files are read one-by-one.文件被一个一个地读取。 So the results for each file need to be cached before being exportet.所以每个文件的结果需要在exportet之前缓存。

Somehow I do not get the output that I want, but only meta data.不知何故,我没有得到我想要的输出,而只有元数据。 How can I achieve the correct output?我怎样才能获得正确的输出?

I took the time to write out a simulation of your situation.我花时间写了一个模拟你的情况。

# File names. The number of files should match the number of hash tables
$Files = 'file1','file2','file3','file4','file5'
# hash table results per file (simulated)
$HashPerFile = [ordered]@{ hello = 4; goodbye = 3; what = 1; is = 7; this = 4 },
     [ordered]@{ password = 2; hope = 1; they = 3; are = 2; not = 5; plain = 2; text = 18},
     [ordered]@{ help = 6; me = 2; please = 5 },
     [ordered]@{ decrypt = 1; the = 3; problem = 1 },
     [ordered]@{ because = 2; I = 5; cannot = 9 }
# Headers for the object output
$properties = $Files |% {"$_ word";"$_ count"}

# Determining max number of rows in results based on highest hash table length
$MaxRows = [linq.enumerable]::max([int[]]($hashperfile |% {$_.Count}))

# Precreating the result array $r
$r = 1..$MaxRows |% { "" | select $properties }

# Index of $properties. This helps select the correct 'file word' and 'file count' property
$pIndex = 0

# for loop to go through each file and hash table
for ($i = 0; $i -lt $files.count; $i++) {

# rIndex is the index of the $r array.
# When a new file is selected, this needs to reset to 0 so we can begin at the top of the $r array again.
        $rIndex = 0

# Iterate the hash table that matches the file. Index $i ensures this.
        $hashPerFile[$i].GetEnumerator() |% { 
            $r[$rIndex].$($properties[$pIndex]) = $_.Key
            $r[$rIndex++].$($properties[$pIndex+1]) = $_.Value
        }

# Have to use +2 because there are two properties for each file
        $pIndex += 2
}

$r # Output
$r | Export-Csv output.csv -NoType # CSV output

I have a large number of files, of which I want to do a word analysis - counting how often each word appears within each file.我有大量文件,我想对它们进行词分析-计算每个文件中每个词出现的频率。 As the final output I want to have a CSV file with the file names in the heading and for each file two columns - word and the respective count.作为最终输出,我希望有一个CSV文件,标题中带有文件名,每个文件都有两列-word和相应的计数。

file1 word, file1 count, file2 word, file2 count, ....
hello, 4, world, 5, ...
password, 10, save, 2, ...

To achieve this I open each file and save the word count in a hash table.为此,我打开每个文件并将字数保存在哈希表中。 Because each hash table has a different length (different number of unique words) I try to put the results in a data table to export them.因为每个哈希表的长度都不同(唯一字的数量不同),所以我尝试将结果放入数据表中以将其导出。

$file = Get-ChildItem -Recurse 

$out = New-Object System.Data.DataSet "ResultsSet"

foreach($f in $file){
$pres = $ppt.Presentations.Open($f.FullName, $true, $true, $false)
$id = $f.Name.substring(0,5)

$results = @{} #Hash table for this file
for($i = 4; $i -le $pres.Slides.Count; $i++){
    $s = $pres.Slides($i)
    $shapes = $s.Shapes 
    $textBox = $shapes | ?{$_.TextFrame.TextRange.Length -gt 100}

    if($textBox -ne $null){
        $textBox.TextFrame.TextRange.Words() | %{$_.Text.Trim()} | %{if(-not $results.ContainsKey("$_")){$results.Add($_,1)}else{$results["$_"] += 1 }}
    }
}

$pres.Close()

$dt = New-Object System.Data.DataTable
$dt.TableName = $id
[String]$dt.Columns.Add("$id Word")
[Int]$dt.Columns.Add("$id Count")
foreach($r in ($results.GetEnumerator() | sort Value)) {
    $dt.Rows.Add($r.Key, $r.Value)
}
$out.Tables.Add($dt)
}

$out | export-csv

There are two main issues:有两个主要问题:

  1. The number of unique words is different for each file (hash tables have different length)每个文件的唯一字数不同(哈希表的长度不同)
  2. Files are read one-by-one.文件被一对一读取。 So the results for each file need to be cached before being exportet.因此,每个文件的结果在导出之前都需要进行缓存。

Somehow I do not get the output that I want, but only meta data.不知何故,我没有得到想要的输出,而只有元数据。 How can I achieve the correct output?如何获得正确的输出?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM