简体   繁体   中英

grep to exclude a symbol in the beginning

I have an xml file and it has '<' in between the lines without escape characters in front.. So first thing i tried to parse the xml using:

xmllint --noout filename.xml

but that doesnt work.. because my xml version is 1.1 which is not supported.. So as an alternate I started searching for '<' excluding the beginning or the end of the sentence..

should be fairly easy.. i tried:

grep -v '^[<]'

but that is not working.. can someone help?

ex: filename has:

 <instrument F001="6-A-1046" INSTRUMENT_ID="<xyz>" >
  <field fieldname="CUR007" value="<EUR>"/>
  <field fieldname="C207" value="2023-01-11"/>
  <field fieldname="INS160" value="0"/>
  <field fieldname="PRD013" value="1020"/>
  <field fieldname="PRD150" value="0"/>
  <field fieldname="PRD205" value="0"/>
 </instrument>

I want output to be

 <instrument F001="6-A-1046" INSTRUMENT_ID="<xyz>" >
  <field fieldname="CUR007" value="<EUR>"/>

Search for a < or > other than the first/last non-whitespace char which should be angle brackets.

grep '^\s*<.*[<>].*>\s*' 

Note that this matches the whole line, so it may be used if you are wanting to do something with the line (rather than just part of it).


A test:

grep '^\s*<.*[<>].*>\s*' << EOF
>  <instrument F001="6-A-1046" INSTRUMENT_ID="<xyz>" >
>   <field fieldname="CUR007" value="<EUR>"/>
>   <field fieldname="C207" value="2023-01-11"/>
>   <field fieldname="INS160" value="0"/>
>   <field fieldname="PRD013" value="1020"/>
>   <field fieldname="PRD150" value="0"/>
>   <field fieldname="PRD205" value="0"/>
>  </instrument>
> EOF

Output:

<instrument F001="6-A-1046" INSTRUMENT_ID="<xyz>" >
 <field fieldname="CUR007" value="<EUR>"/>

I've created a different sample to add some more cases

$ cat ip.txt
foo bar < xyz
<123 abc <42> >
  <good>
bad > line

$ # get lines having < not at start of line
$ grep '[^[:blank:]].*<' ip.txt
foo bar < xyz
<123 abc <42> >

$ # get lines having > not at end of line
$ grep '>.*[^[:blank:]]' ip.txt
<123 abc <42> >
bad > line

$ # combining the two
$ grep -E '[^[:blank:]].*<|>.*[^[:blank:]]' ip.txt
foo bar < xyz
<123 abc <42> >
bad > line
  • [:blank:] represents space and tab characters
  • so [^[:blank:]] will match a non-blank character

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM