简体   繁体   中英

powershell - remove string containing line breaks and spaces

I have a script running in powershell (v2), that removes strings from a file.

The basic process is:

(Get-Content $Local_Dir1\$filename1) -replace 'longString', 'shortString' | ` 
Set-Content $cfg_Local_Dir\$filename1

Get-Content $Local_Dir1\$filename1 | `
            Where-Object {$_ -notmatch 'stringToMatch'} | `
            Where-Object {$_ -notmatch 'secondStringToMatch'} | `
            Set-Content $Local_Dir1\$filename

This works fine. However, I have an annoying string that I can't get rid of.

It basically consists of: a line break and carriage return, 4 spaces, and then a line break and carriage return. In HEX it is 0D 0A 20 20 20 20 0D 0A

How can I remove this?

I tried simply:

Where-Object {$_ -notmatch '    '} #4 x spaces

But that removed all content after that line (and this is on the second line).

I looked at:

Where-Object {$_ -notmatch '$([char]0x0D)'}

(I would have expanded it if it had removed all the Carriage Returns) which I saw in another post somewhere, but that did nothing.

What is the correct way of dealing with this problem?


Additional: 2015-11-24 13:49

Example Data:

<?xml version="1.0" encoding="UTF-8"?>

<start_of_data>
        <job>123456</job>
        <name>ABC123</name>
        <start></start> 
</start_of_data> 
<start_of_data>
        <job>789012</job>
        <name>DEF345</name>
        <start></start> 
</start_of_data>

Initially there is a string on line 2 which is removed by 'stringToMatch', and the spaces are on line3.

Couple of things worth pointing out here. When you use -match / -notmatch you are using regex. We can consolidate your strings and space issue into one string.

Get-Content $Local_Dir1\$filename1 | 
    Where-Object {$_ -notmatch 'stringToMatch|secondStringToMatch|\s{4,}'} | 
    Set-Content $Local_Dir1\$filename

That works using alternation to match either element separated by pipes. This is by no means perfect as we don't have sample data to work with but if you have lines with either of those two string or at least 4 consecutive spaces they will be omitted.

From talking in the comments and looking at the example file you are just trying to omit lines that are blank. Using another string class or regex could fix that. These lines function differently but would both ignore lines that are just white-space.

  • ![string]::IsNullOrWhiteSpace($_)
  • -notmatch ^\\s+$

I will op'd for the former as it is more intuitive.

Where-Object {![string]::IsNullOrWhiteSpace($_) -and $_ -notmatch 'stringToMatch|secondStringToMatch'}

Like I said in comments if you are picky on this requirement that you could filter out lines with exactly 4 white-space characters with -notmatch ^\\s{4}$


Also like sodawillow says you should have used double quotes to allow variable expansion. Since you are using regex \\r would have worked just as well.

Where-Object {$_ -notmatch "$([char]0x0D)"}

However I don't think you would have seen that character anyway in order to exclude it. Get-Content would scrub that out to make a string array. That might depend on encoding.

Try .Net String class:

Where-Object {-not[string]::IsNullOrEmpty(([string]$_).trim())}

Trim will remove spaces and IsNullOrEmpty will check the rest.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM