简体   繁体   English

PowerShell 删除或跳过 CSV 中的列

[英]PowerShell Delete or Skip Columns in CSV

First of all: I'm a PowerShell rookie.首先:我是 PowerShell 菜鸟。 I have little experience using PowerShell to modify or change CSV files.我几乎没有使用 PowerShell 修改或更改 CSV 文件的经验。

Our system gives out a uncommon CSV format, which looks like this:我们的系统给出了一种不常见的 CSV 格式,如下所示:

Example1;Example2;Name;Lastname;ContentOfExample1;ContentOfExample2;John;Doe

The header is on every row infront of the information.标题位于信息前面的每一行。 I want to get rid of some Columns like Example1 and Example2.我想去掉一些列,比如 Example1 和 Example2。

As a second step I need to assign a new header作为第二步,我需要分配一个新标题

-Header Name,Lastname,Adress,Phone,.. and so on. -Header Name,Lastname,Adress,Phone,..等。

I'm thankful for any tipps :-)我很感谢任何提示:-)

By definition this pattern results in an even number of ";"根据定义,此模式导致偶数个“;” delimited elements.分隔元素。 You can use that to your advantage, by arithmetically assigning properties to objects then re-emitting them to a new CSV file.您可以利用这一点,通过算术将属性分配给对象,然后将它们重新发送到新的 CSV 文件。

Might look something like:可能看起来像:

Get-Content C:\Temp\InitialCSVFile.csv |
ForEach-Object{
    $TempArr  = $_.Split( ';' )
    $TempHash = [Ordered]@{}
    For($i = 0; $i -lt ($TempArr.Count / 2); ++$i)
    {
        $TempHash[ $TempArr[ $i ] ] = $TempArr[ $i+4 ]
    }
    [PSCustomObject]$TempHash
} |
Export-CSV -path C:\Temp\TestCSV.csv -NoTypeInformation -Append -Delimiter ';'

The code is reading the file contents as plain strings, not a semi-structured format like CSV.代码将文件内容作为纯字符串读取,而不是像 CSV 这样的半结构化格式。 As each line is piped to ForEach-Object the .Split() string method is creating an array ( $_ -split ';' would work too).当每一行都通过管道传送到ForEach-Object.Split()字符串方法正在创建一个数组( $_ -split ';'也可以)。 The we instantiate a Hash/Dictionary object to hold some key value pairs.我们实例化一个 Hash/Dictionary 对象来保存一些键值对。 Once that's done a traditional For loop is used to reference the kay names & values.完成后,将使用传统的 For 循环来引用 kay 名称和值。 The name is element 0 and therefore it's value should be 0+4.名称是元素 0,因此它的值应该是 0+4。 Note: the loop is coded to stop at the halfway point in the array.注意:循环被编码为在数组的中间点停止。 That's why the even number of elements I mentioned earlier is important!这就是为什么我之前提到的偶数元素很重要!

Once the hash table is complete the code casts it to a [PSCustomObject] and sends it down the pipeline to Export-CSV which of course deals in objects.哈希表完成后,代码将其转换为[PSCustomObject]并将其通过管道发送到Export-CSV ,这当然处理对象。 This should result in a new CSV file that looks something like:这应该会生成一个类似于以下内容的新 CSV 文件:

Example1          Example2          Name Lastname
--------          --------          ---- --------
ContentOfExample1 ContentOfExample2 John Doe
ContentOfExample1 ContentOfExample2 John Doe

Note: Obviously the data is redundant because I just repeated your sample in the input file.注意:显然数据是多余的,因为我只是在输入文件中重复了您的示例。 That shouldn't be a problem with your live data.这应该不是您的实时数据的问题。

Note: May not need to repeatedly recreate $TempHash , since we'll reassign each key's value on each loop internal iteration.注意:可能不需要重复重新创建$TempHash ,因为我们将在每个循环内部迭代中重新分配每个键的值。 For now I'll let this example stand as is.现在我让这个例子保持原样。

Update: To Exclude Properties:更新:排除属性:

$ExcludeProperties = @( 'Example1', 'Example2' )

Get-Content C:\Temp\InitialCSVFile.csv |
ForEach-Object{
    $TempArr  = $_.Split( ';' )
    $TempHash = [Ordered]@{}
    For($i = 0; $i -lt ($TempArr.Count / 2); ++$i)
    {
        $TempHash[ $TempArr[ $i ] ] = $TempArr[ $i+4 ]
    }
    [PSCustomObject]$TempHash
} |
Select-Object -Property * -ExcludeProperty $ExcludeProperties |
Export-CSV -path C:\Temp\TestCSV.csv -NoTypeInformation -Append -Delimiter ';'

A strange way to output a CSV indeed..确实是一种输出 CSV 的奇怪方式..

What you could do is to split the first line by the delimiter character ;你可以做的是用分隔符分割第一行; in order to get the headers for each column.为了获得每列的标题。

Once you have that, the rest should not be too hard to do:一旦你有了它,剩下的就不应该太难了:

$csv = Get-Content -Path 'D:\Test\blah.csv' | Where-Object {$_ -match '\S'}

$parts = $csv[0] -split ';'
# calculate the number of parts that make up the headers
[int]$numberOfHeaders = $parts.Count / 2
# join the headers into a string
$header = $parts[0..($numberOfHeaders - 1)] -join ';'
# cut off the headers from every line
$rows = foreach ($line in $csv) { $line.Substring($header.Length + 1) }

# convert to an array of objects, skip the first two columns and export to a new file
$header, $rows | ConvertFrom-Csv -Delimiter ';' | 
    Select-Object * -ExcludeProperty $parts[0..1] | 
    Export-Csv -Path 'D:\Test\blah2.csv' -Delimiter ';' -NoTypeInformation

Assuming the number of columns could be random and the properties to exclude are known, you can do the following to parse your data as custom objects:假设列数可能是随机的并且要排除的属性是已知的,您可以执行以下操作将您的数据解析为自定义对象:

Get-Content file.csv | Foreach-Object {
    $count = 0 # Tracks column counts to split the row evenly
    $cols = $_ -split ';'
    # $headers gets the first half of the columns. $data gets the remainder.
    $headers,$data = $cols.where({$count++ -lt $cols.count/2},'Split')
    # Uses calculated properties to add your new properties. You will need to fill in your own logic since you provided none here.
    ($headers -join ';'),($data -join ';') | ConvertFrom-Csv -Delimiter ';' |
        Select-Object *,@{n='Address';e={'Electric Avenue'}},@{n='Phone';e={'867-5309'}} -exclude example1,example2
}

If all data in a csv file contains the same headers, you could just use Export-Csv to create a proper CSV from the data:如果 csv 文件中的所有数据都包含相同的标题,则可以使用Export-Csv从数据创建正确的 CSV:

Get-Content file.csv | Foreach-Object {
    $count = 0 # Tracks column counts to split the row evenly
    $cols = $_ -split ';'
    $headers,$data = $cols.where({$count++ -lt $cols.count/2},'Split')
    ($headers -join ';'),($data -join ';') | ConvertFrom-Csv -Delimiter ';' |
        Select-Object *,@{n='Address';e={'Electric Avenue'}},@{n='Phone';e={'867-5309'}} -exclude example1,example2
} | Export-Csv output.csv -NoType

If each individual row could have a varied number of columns, you will likely need a CSV file per row unless you parse all of the data and determine all the possible column names.如果每一行都可以有不同数量的列,那么您可能需要每行一个 CSV 文件,除非您解析所有数据并确定所有可能的列名。 If you want to keep the same format as the source but just want to manipulate columns and data, you can do the following, which will work with a varied number of columns:如果您想保持与源格式相同的格式,但只想操作列和数据,您可以执行以下操作,这将适用于不同数量的列:

Get-Content file.csv | Foreach-Object {
    $count = 0 # Tracks column counts to split the row evenly
    $cols = $_ -split ';'
    $headers,$data = $cols.where({$count++ -lt $cols.count/2},'Split')
    $newObj = ($headers -join ';'),($data -join ';') | ConvertFrom-Csv -Delimiter ';' |
        Select-Object *,@{n='Address';e={'Electric Avenue'}},@{n='Phone';e={'867-5309'}} -exclude example1,example2
    "{0};{1}" -f ($newObj.psobject.properties.name -join ';'),($newObj.psobject.properties.value -join ';')
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM