[英]Trapping regex matches in an array with powershell
We have a large .vcf that we exported from a mac users computer. 我们有一个从mac用户计算机导出的大型.vcf。
the process of exporting contacts resulted in a single .vcf that has bunched all the contacts into 1 file. 导出联系人的过程导致单个.vcf将所有联系人聚集到一个文件中。 I used notepad++ to replace all instances of "BEGIN:" with "\\nBEGIN:" so that I can sleep tonight.
我使用notepad ++将“BEGIN:”的所有实例替换为“\\ nBEGIN:”,以便我今晚可以睡觉。
The plan is to put each match to my reg expression into an array then out-file each string into many uniquely named .vcf files 计划是将每个匹配的reg表达式放入一个数组中,然后将每个字符串输出到许多唯一命名的.vcf文件中
(I was planning on adding back the strings "BEGIN:VCARD" and "END:VCARD" to the beginning and end of each file later.) (我打算稍后将字符串“BEGIN:VCARD”和“END:VCARD”添加回每个文件的开头和结尾。)
this is snip it of the data we are working with, : 这是我们正在使用的数据的剪辑,:
BEGIN:VCARD
VERSION:3.0
PRODID:-//Apple Inc.//Mac OS X 10.13.4//EN
N:;;;;
TEL;type=CELL;type=VOICE;type=pref:+18005555555
UID:3fe8e0-421c-4c6a-bfa-38c75df8c07
X-ABUID:3FE8490-421C-4C6A-B2FA-38C15DF8C07:ABPerson
END:VCARD
BEGIN:VCARD
VERSION:3.0
PRODID:-//Apple Inc.//Mac OS X 10.13.4//EN
N:;<blah@company.org>;;;
FN:<blah@company.org>
item1.EMAIL;type=INTERNET;type=pref:blah@company.org
item1.X-ABLabel:_$!<Other>!$_
UID:5ad596-a879-4c98-9f56-2ef90efe32f
X-ABUD:DB5C20C-6DFC-450F-A752-D57964F6F3A:ABPerson
END:VCARD
...
I got close with the code below but it only returns the first match 我接近下面的代码,但它只返回第一场比赛
$String = cat C:\temp\contacts.txt
$Regex = [Regex]::new("(?<=BEGIN:VCARD)(.*?)(?=END:VCARD)")
$Match = $Regex.Match($String)
if($Match.Success)
{
$Match.Value
}
always cite your source 总是引用你的来源
I need it to parse the entire string and find all matches like this dude did: 我需要它来解析整个字符串并找到像这样的老兄所做的所有匹配:
$matches_found = @()
cat myfile.txt | %{
if ($_ -match '(?<=BEGIN:VCARD)(.*?)(?=END:VCARD)'){
$matches_found += $matches[1]
}
}
always cite your source 总是引用你的来源
but when I put my regex in to this code it doesn't find any matches 但是当我把我的正则表达式放到这个代码中时,它找不到任何匹配项
You are only asking for a single match in each of the code blocks you posted. 您只需在您发布的每个代码块中要求一个匹配项。 You'd want to use the RegEx Matches instead.
您想要使用RegEx Matches。
This should get you what you are after: 这应该可以让你得到你想要的:
$VCardData = @'
BEGIN:VCARD
VERSION:3.0
PRODID:-//Apple Inc.//Mac OS X 10.13.4//EN
N:;;;;
TEL;type=CELL;type=VOICE;type=pref:+18005555555
UID:3fe8e0-421c-4c6a-bfa-38c75df8c07
X-ABUID:3FE8490-421C-4C6A-B2FA-38C15DF8C07:ABPerson
END:VCARD
BEGIN:VCARD
VERSION:3.0
PRODID:-//Apple Inc.//Mac OS X 10.13.4//EN
N:;<blah@company.org>;;;
FN:<blah@company.org>
item1.EMAIL;type=INTERNET;type=pref:blah@company.org
item1.X-ABLabel:_$!<Other>!$_
UID:5ad596-a879-4c98-9f56-2ef90efe32f
X-ABUD:DB5C20C-6DFC-450F-A752-D57964F6F3A:ABPerson
END:VCARD
'@
# Use RegEx match to search for strings across line breaks.
$VcardRegEx = '(?s)(?<=BEGIN:VCARD).*?(?=END:VCARD)'
# Select all matches
[RegEx]::Matches($VCardData,$VcardRegEx).Value
#results
VERSION:3.0
PRODID:-//Apple Inc.//Mac OS X 10.13.4//EN
N:;;;;
TEL;type=CELL;type=VOICE;type=pref:+18005555555
UID:3fe8e0-421c-4c6a-bfa-38c75df8c07
X-ABUID:3FE8490-421C-4C6A-B2FA-38C15DF8C07:ABPerson
VERSION:3.0
PRODID:-//Apple Inc.//Mac OS X 10.13.4//EN
N:;<blah@company.org>;;;
FN:<blah@company.org>
item1.EMAIL;type=INTERNET;type=pref:blah@company.org
item1.X-ABLabel:_$!<Other>!$_
UID:5ad596-a879-4c98-9f56-2ef90efe32f
X-ABUD:DB5C20C-6DFC-450F-A752-D57964F6F3A:ABPerson
Update as per the OP's follow-up question 根据OP的后续问题进行更新
# How many records are in the set
([RegEx]::Matches($VCardData,$VcardRegEx).Value).Count
# Results
2
# Output each record as a separate file
# Set the counter
$VCardCounter = 0
# Loop through the dataset and output to a new file for each
ForEach($Vcard in ([RegEx]::Matches($VCardData,$VcardRegEx).Value))
{
$VCardFileName = 'VCard' + ++$VCardCounter + ".txt"
New-Item -Path $pwd -ItemType File -Name $VCardFileName
Add-Content -Value $Vcard -Path "$pwd\$VCardFileName"
}
Get-ChildItem -Path "$pwd\Vcard*"
# List the new files
Directory: D:\Scripts
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 03-Jun-18 15:36 209 VCard1.txt
-a---- 03-Jun-18 15:36 286 VCard2.txt
# Review the contents of the new files
Get-Content (Get-ChildItem -Path "$pwd\Vcard*")
# Results
VERSION:3.0
PRODID:-//Apple Inc.//Mac OS X 10.13.4//EN
N:;;;;
TEL;type=CELL;type=VOICE;type=pref:+18005555555
UID:3fe8e0-421c-4c6a-bfa-38c75df8c07
X-ABUID:3FE8490-421C-4C6A-B2FA-38C15DF8C07:ABPerson
VERSION:3.0
PRODID:-//Apple Inc.//Mac OS X 10.13.4//EN
N:;<blah@company.org>;;;
FN:<blah@company.org>
item1.EMAIL;type=INTERNET;type=pref:blah@company.org
item1.X-ABLabel:_$!<Other>!$_
UID:5ad596-a879-4c98-9f56-2ef90efe32f
X-ABUD:DB5C20C-6DFC-450F-A752-D57964F6F3A:ABPerson
This PowerShell script 此PowerShell脚本
BEGIN:VCARD
. BEGIN:VCARD
开头的块。 NoUID#0000.vcf
with an incrementing counter NoUID#0000.vcf
,带有递增计数器 EDIT simplified variant with only a counter for the out file name 编辑简化版本,只有一个计数器用于输出文件名
## Q:\Test\2018\06\02\SO_50659915.ps1
$InFile = '.\sample.vcf'
$Delimiter = 'BEGIN:VCARD'
$Split = "(?!^)(?=$Delimiter)"
(Get-Content $InFile -Raw) -split $Split | ForEach-Object {$I=0}{
$I++
$_ | Out-File -FilePath ("Whatever{0:0000}.vcf" -f $I) -Encoding UTF8
}
## Q:\Test\2018\06\02\SO_50659915.ps1
$InFile = '.\sample.vcf'
$Delimiter = 'BEGIN:VCARD'
# If the Delimiter contains chars that would be interpreted as special RE chars
# they need to be escaped, either manually or with the following command
# $Escaped = [regex]::Escape($Delimiter)
$Split = "(?!^)(?=$Delimiter)"
(Get-Content $InFile -Raw) -split $Split | ForEach-Object {$I=0}{
if ($_ -match 'UID:(?<UID>[0-9a-f\-]{32})'){
$_ | Out-File -FilePath ($Matches.UID+".vcf") -Encoding UTF8
} else {
$I++
$_ | Out-File -FilePath ("NoUID#{0:0000}.vcf" -f $I) -Encoding UTF8
}
}
Sample resulting output: 样本结果输出:
> ls
Directory: Q:\Test\2018\06\02
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 2018-06-03 20:05 236 3fe8e0-421c-4c6a-bfa-38c75df8c07.vcf
-a---- 2018-06-03 20:05 311 5ad596-a879-4c98-9f56-2ef90efe32.vcf
-a---- 2018-06-03 20:05 236 NoUID#0001.vcf
-a---- 2018-06-03 20:05 311 NoUID#0002.vcf
-a---- 2018-06-02 21:45 537 sample.vcf
-a---- 2018-06-03 19:41 416 SO_50659915.ps1
> Get-Content .\3fe8e0-421c-4c6a-bfa-38c75df8c07.vcf
BEGIN:VCARD
VERSION:3.0
PRODID:-//Apple Inc.//Mac OS X 10.13.4//EN
N:;;;;
TEL;type=CELL;type=VOICE;type=pref:+18005555555
UID:3fe8e0-421c-4c6a-bfa-38c75df8c07
X-ABUID:3FE8490-421C-4C6A-B2FA-38C15DF8C07:ABPerson
END:VCARD
> Get-Content .\5ad596-a879-4c98-9f56-2ef90efe32.vcf
BEGIN:VCARD
VERSION:3.0
PRODID:-//Apple Inc.//Mac OS X 10.13.4//EN
N:;<blah@company.org>;;;
FN:<blah@company.org>
item1.EMAIL;type=INTERNET;type=pref:blah@company.org
item1.X-ABLabel:_$!<Other>!$_
UID:5ad596-a879-4c98-9f56-2ef90efe32f
X-ABUD:DB5C20C-6DFC-450F-A752-D57964F6F3A:ABPerson
END:VCARD
>
LotPings beat me to it. LotPings打败了我。 Anyway, here's my solution
无论如何,这是我的解决方案
# Enter the full path and filename of your large combined vcf file here
$InputFile = '<The full path and filename to your vcf file>'
# The path where yhou want the output vcf files. Below defaults to a folder 'VCards' within your Temp directory
$OutputPath = Join-Path $env:TEMP 'VCards'
# Read the input file in a single string
$VCardData = Get-Content $InputFile -Raw
# Create the output folder if it does not already exist
if (!(Test-Path $OutputPath -PathType Container)) {
New-Item -ItemType Directory -Force -Path $OutputPath | Out-Null
}
# Use RegEx match to search for strings across line breaks.
# This regex will keep the "BEGIN:VCARD" and "END:VCARD" for each array element intact
$VcardRegex = '(?s)((?:BEGIN:VCARD).*?(?:END:VCARD))'
# This regex is for parsing out the UID value of the vcard if present
$UidRegex = '\b(?:UID:)(?:urn:)?(?:uuid:)?([0-9a-f\-]*)\b'
# Select all matches
$VCardArray = [RegEx]::Matches($VCardData,$VcardRegex).Value
# Save results to $OutputPath as separate .vcf files
# using the UID value as filename. If no UID is found in the VCard element,
# a safety name is generated using a simple counter $i.
# Each file is encoded in UTF-8 encoding. If you use the Set-Content commandlet with option -Encoding UTF8
# it will create files prefixed with a byte order mark (BOM).
# Because it is usually advisable to create the file without the BOM, i use [System.IO.File]::WriteAllText
# using an encoding object
$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $false
$i = 0
$VCardArray | ForEach-Object {
if ($_ -match $UidRegex) {
$fileName = $matches[1] + '.vcf'
}
else {
$fileName = 'Vcard_{0:000}.vcf' -f $i++
}
$fileOut = Join-Path $OutputPath $fileName
try {
[System.IO.File]::WriteAllText($fileOut, $_, $Utf8NoBomEncoding)
Write-Host "Saved file '$fileOut'"
}
catch {
Write-Error "Could not write file '$fileOut':`r`n$($_.Exception.Message)"
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.