简体   繁体   English

Powershell - 如何搜索(使用通配符)和替换 CSV 文件中的值?

[英]Powershell - How to search (using wildcard) and replace values in a CSV file?

I have a CSV file (one column/field only) with thousands of records in it.我有一个 CSV 文件(仅一列/字段),其中包含数千条记录。

I need a way in Powershell to search for a value using a few characters followed by a wildcard and, where found, then replace that value with a ".我需要 Powershell 中的一种方法来使用几个字符后跟通配符来搜索一个值,然后在找到时用“。

I have searched around on how to do this but everyting I have found so far either doesn't cover CSV files or doesn't explain how I might be able to do the search using a wildcard.我已经搜索了如何执行此操作,但到目前为止我发现的所有内容要么不涵盖 CSV 文件,要么没有解释我如何能够使用通配符进行搜索。

Example of values in CSV file: CSV 文件中的值示例:

<#

RanDom.Texto 1.yellow [ Table - wood ] "gibberishcode1.moreRandomText11.xyz123+456"
R@ndomEq.Textolo 2.blue [Chair - steel ] "gibberishcode2.moreRandomText222.xyz19283+4567+89
randomi.Textpel 3.green [ counter - granite] "gibberishcode3.moreRandomText3333.xyz17243+3210+987+654"

#>

You will note above that the only values in common across the records are the .xyz in each record.您将在上面注意到,记录中唯一的共同值是每条记录中的.xyz

I want to replace the .xyz (and everything that follows) with a " value.我想用"值替换.xyz (以及随后的所有内容)。

Eg Desired result as follows:例如,期望的结果如下:

<#

RanDom.Texto 1.yellow [ Table - wood ] "gibberishcode1.moreRandomText11"
R@ndomEq.Textolo 2.blue [Chair - steel ] "gibberishcode2.moreRandomText222"
Randomi.Textpel 3.green [ counter - granite] "gibberishcode3.moreRandomText3333"

#>

Here is some code I tried but it doesn't work in that it didn't replace the values (but it does successfuly export to a new csv file).这是我尝试过的一些代码,但它不起作用,因为它没有替换值(但它确实成功导出到新的 csv 文件)。

# Create function that gets the current file path (of where this script is located)
function Get-ScriptDirectory {Split-Path -parent $PSCommandPath}

# Create function that gets the current date and time in format of 1990-07-01_19h15m59
function Get-TimeStamp {return "{0:yyyy-MM-dd}_{0:HH}h{0:mm}m{0:ss}" -f (Get-Date)}

# Set current file path. Also used in both FOR loops below as primary source directory.
${sourceDirPath} = Get-ScriptDirectory

# Import CSV look-up file 
${csvFile} = (Import-Csv -Path ${sourceDirPath}\SourceCSVFile.csv)
    
# for each row, replace the values of .xyz and all that follows with "
foreach(${row} in ${csvFile}) 
{
    ${row} = ${row} -replace '.xyz*','"'
}

# Set modified CSV's name and path
${newCSVFile} = ${sourceDirPath} + '\' + $(Get-TimeStamp) + '_SourceCSVFile_Modified.csv'

# export the modified CSV
${csvFile} | Export-Csv ${newCSVFile} -NoTypeInformation

I also tried this as an alternative but no luck either (i think this code below may only work for.txt files??)...我也尝试过这个作为替代方案,但也没有运气(我认为下面的这段代码可能只适用于 .txt 文件??)......

((Get-Content -path C:\TEMP\TEST\SourceCSVFile.csv -Raw) -replace '.xyz'*,'"') | Export-Csv -Path C:\TEMP\TEST\ReplacementFile.csv

I'm new to Powershell and don't have a proper understanding of regex yet so please be gentle.我是 Powershell 的新手,对正则表达式还没有正确的理解,所以请保持温和。

UPDATE and SOLUTION:更新和解决方案:

For those that are interested in my final solution... I used the code provided by Thomas (Thank you!!) however my .csv file was left with some records that had a triple quote """ value at the end of the string.对于那些对我的最终解决方案感兴趣的人......我使用了 Thomas 提供的代码(谢谢!!)但是我的.csv文件留下了一些在字符串末尾具有三引号"""值的记录.

As such I modified the code to use variables and execute a second pass of cleaning by replacing all triple quotation (eg """) values with a single quote value (eg ") and then piping the result to file.因此,我修改了代码以使用变量并通过将所有三引号(例如“”)值替换为单引号值(例如“)来执行第二次清理,然后将结果通过管道传输到文件。

# Create function that gets the current file path (of where this script is located and running from)
function Get-ScriptDirectory {Split-Path -parent $PSCommandPath}

# Set current file path
${sourceDirPath} = Get-ScriptDirectory

# Assign source .csv file name to variable
$origNameSource = 'AllNames.csv'

# Assign desired .csv file name post cleaning
$origNameCLEAN = 'AllNames_CLEAN.csv'
    
# First pass clean to replace .xyz* with " and assign result to tempCsvText variable
${tempCsvText} = ((Get-Content -Path ${sourceDirPath}\$origNameSource) | % {$_ -replace '\.xyz.*$', '"'})

# Second pass clean to replace """ with " and write result to a new .csv file
${tempCsvText} -replace '"""', '"' | Set-Content -Path ${sourceDirPath}\$origNameCLEAN

# Import records from new .csv file and remove duplicates by using Sort-Object * -Unique
${csvFile} = (Import-Csv -Path ${sourceDirPath}\$origNameCLEAN) | Sort-Object * -Unique

First, a .csv file is nothing else than a regular text file, just following some rules on how content is embedded (one line for each row, columns delimited by a defined ASCII character, optional header).首先, .csv文件只不过是一个普通的文本文件,只是遵循一些关于如何嵌入内容的规则(每行一行,由定义的 ASCII 字符分隔的列,可选标题)。 Your last line is close.你的最后一行很接近。 You have to use a regular expression, that reaches until the end of a line.您必须使用正则表达式,直到行尾。 This will do it:这将做到:

Get-Content -Path C:\TEMP\TEST\SourceCSVFile.csv | % {$_ -replace '\.xyz.*$', '"'} | Set-Content -Path C:\TEMP\TEST\ReplacementFile.csv

Differences:差异:

  • I removed the -Raw parameter to get each line as one string.我删除了-Raw参数以将每一行作为一个字符串。
  • I used the pipe to process each string (line)我使用 pipe 处理每个字符串(行)
  • I adjusted your regex to match from .xyz until the end of each line我调整了你的正则表达式以匹配从.xyz直到每一行的结尾
  • I piped the result to Set-Content as I only did text replacement and did not read any objects that would then have to be retranslated back to csv text by Export-Csv我将结果通过管道传输到Set-Content ,因为我只进行了文本替换并且没有读取任何必须通过Export-Csv重新翻译回 csv 文本的对象

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM