简体   繁体   English

对照另一个 csv 文件中的每一行检查 csv 文件值

[英]Check csv file values against every line in another csv file

I have two csv file where I contain data, I need to check if value from CSV 1 exist in CSV 2 and if so then replace this value in file2 with data from file1, if no just skip to another row,我有两个 csv 文件,其中包含数据,我需要检查 CSV 1 中的值是否存在于 CSV 2 中,如果是,则跳过文件 2 中的这个值,用文件 1 中的数据替换

File1.csv文件1.csv

NO;Description
L001;DREAM
L002;CAR
L003;PHONE
L004;HOUSE
L005;PLANE

File2.csv文件2.csv

ID;Name;Status*;Scheduled Start Date;Actual Start Date;Actual End Date;Scheduled End Date;SLA
144862;DREAM;Scheduled;1524031200;;;1524033000;
149137;CAR;Implementation In Progress;1528588800;;;1548968400;
150564;PHONE;Scheduled;1569456000;;;1569542400;
150564;HOUSE;Scheduled;1569456000;;;1569542400;
150564;PLANE;;;;;;

I tried something like that but it is not working for me:我尝试了类似的方法,但它对我不起作用:

    $file1 = Import-Csv "C:\Users\file1.csv" |Select-Object -ExpandProperty Description
$file2 = Import-Csv "C:\Users\file1.csv" |Select-Object -ExpandProperty NO
        Import-Csv "C:\Users\file3.csv" |Where-Object {$file1 -like $_.Name} |ForEach-Object {
          $_.Name = $file2($_.NO)
    } |Out-File "C:\Users\File4.csv"

File4.csv should like that: File4.csv 应该是这样的:

ID;Name;Status*;Scheduled Start Date;Actual Start Date;Actual End Date;Scheduled End Date;SLA
144862;L001;Scheduled;1524031200;;;1524033000;
149137;L002;Implementation In Progress;1528588800;;;1548968400;
150564;L003;Scheduled;1569456000;;;1569542400;
150564;L004;Scheduled;1569456000;;;1569542400;
150564;L005;;;;;;

Maybe there is another way to achive my goal!也许还有另一种方法可以实现我的目标! Thank you谢谢

Here's one approach you can take:这是您可以采取的一种方法:

  • Import both CSV files with Import-Csv使用Import-Csv导入两个 CSV 文件
  • Create a lookup hash table from the first CSV file, where the Description you want to replace are the keys, and NO are the values.从第一个 CSV 文件创建查找 hash 表,其中要替换的Description是键, NO是值。
  • Go through the second CSV file, and replace any values from the Name column from the hash table, if the key exists. Go 通过第二个 CSV 文件,并替换 hash 表中Name列中的任何值,如果键存在。 We can use System.Collections.Hashtable.ContainsKey to check if the key exists.我们可以使用System.Collections.Hashtable.ContainsKey来检查 key 是否存在。 This is a constant time O(1) operation, so lookups are fast.这是一个常数时间O(1)操作,因此查找速度很快。
  • Then we can export the final CSV with Export-Csv .然后我们可以使用Export-Csv导出最终的 CSV 。 I used -UseQuotes Never to put no " quotes in your output file. This feature is only available in PowerShell 7 . For lower PowerShell versions, you can have a look at How to remove all quotations mark in the csv file using powershell script? for other alternatives to removing quotes from a CSV file. I used -UseQuotes Never to put no " quotes in your output file. This feature is only available in PowerShell 7 . For lower PowerShell versions, you can have a look at How to remove all quotations mark in the csv file using powershell script? for从 CSV 文件中删除引号的其他替代方法。

Demo:演示:

$csvFile1 = Import-Csv -Path .\File1.csv -Delimiter ";"
$csvFile2 = Import-Csv -Path .\File2.csv -Delimiter ";"

$ht = @{}
foreach ($item in $csvFile1) {
    if (-not [string]::IsNullOrEmpty($item.Description)) {
        $ht[$item.Description] = $item.NO
    }
}

& {
    foreach ($line in $csvFile2) {
        if ($ht.ContainsKey($line.Name)) {
            $line.Name = $ht[$line.Name]
        }
        $line
    }
} | Export-Csv -Path File4.csv -Delimiter ";" -NoTypeInformation -UseQuotes Never

Or instead of wrapping the foreach loop inside a script block using the Call Operator & , we can use Foreach-Object .或者,我们可以使用Foreach-Object ,而不是使用Call Operator &foreach循环包装在脚本块中。 You can have a look at about_script_blocks for more information about script blocks.您可以查看about_script_blocks以获取有关脚本块的更多信息。

$csvFile2 | ForEach-Object {
    if ($ht.ContainsKey($_.Name)) {
        $_.Name = $ht[$_.Name]
    }
    $_
} | Export-Csv -Path File4.csv -Delimiter ";" -NoTypeInformation -UseQuotes Never

File4.csv文件4.csv

ID;Name;Status*;Scheduled Start Date;Actual Start Date;Actual End Date;Scheduled End Date;SLA
144862;L001;Scheduled;1524031200;;;1524033000;
149137;L002;Implementation In Progress;1528588800;;;1548968400;
150564;L003;Scheduled;1569456000;;;1569542400;
150564;L004;Scheduled;1569456000;;;1569542400;
150564;L005;;;;;;

Update更新

For handling multiple values with the same Name , we can transform the above to use a hash table of System.Management.Automation.PSCustomObject , where we have two properties Count to keep track of the current item we're seeing and NO which is an array of numbers:为了处理具有相同Name的多个值,我们可以将上面的内容转换为使用System.Management.Automation.PSCustomObject的 hash 表,其中我们有两个属性Count来跟踪我们看到的当前项目和NO这是数字数组:

$csvFile1 = Import-Csv -Path .\File1.csv -Delimiter ";"
$csvFile2 = Import-Csv -Path .\File2.csv -Delimiter ";"

$ht = @{}
foreach ($row in $csvFile1) {
    if (-not $ht.ContainsKey($row.Description) -and 
        -not [string]::IsNullOrEmpty($item.Description)) {
        $ht[$row.Description] = [PSCustomObject]@{
            Count = 0
            NO = @()
        }
    }
    $ht[$row.Description].NO += $row.NO
}

& {
    foreach ($line in $csvFile2) {
        if ($ht.ContainsKey($line.Name)) {
            $name = $line.Name
            $pos = $ht[$name].Count
            $line.Name = $ht[$name].NO[$pos]
            $ht[$name].Count += 1
        }
        $line
    }
} | Export-Csv -Path File4.csv -Delimiter ";" -NoTypeInformation -UseQuotes Never

If your files aren't too big, you could do this with a simple ForEach-Object loop:如果你的文件不是太大,你可以用一个简单的 ForEach-Object 循环来做到这一点:

$csv1   = Import-Csv -Path 'D:\Test\File1.csv' -Delimiter ';'
$result = Import-Csv -Path 'D:\Test\File2.csv' -Delimiter ';' | 
          ForEach-Object {
              $name = $_.Name
              $item = $csv1 | Where-Object { $_.Description -eq $name } | Select-Object -First 1
              # update the Name property and output the item
              if ($item) { 
                $_.Name = $item.NO
                # if you output the row here, the result wil NOT contain rows that did not match
                # $_   
              }
              # if on the other hand, you would like to retain the items that didn't match unaltered,
              # then output the current row here
              $_
          }

# output on screen
$result | Format-Table -AutoSize

#output to new CSV file
$result | Export-Csv -Path 'D:\Test\File4.csv' -Delimiter ';' -NoTypeInformation

Result on screen:屏幕上的结果:

ID     Name Status*                    Scheduled Start Date Actual Start Date Actual End Date Scheduled End Date SLA
--     ---- -------                    -------------------- ----------------- --------------- ------------------ ---
144862 L001 Scheduled                  1524031200                                             1524033000            
149137 L002 Implementation In Progress 1528588800                                             1548968400            
150564 L003 Scheduled                  1569456000                                             1569542400            
150564 L004 Scheduled                  1569456000                                             1569542400            
150564 L005

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM