简体   繁体   中英

Grep and filter out values from a file

I have a requirement to grep values from a xml file in shell sample file below: test.xml

<wtc-import>
      <name>WTCImportedService-288-rap04</name>
      <resource-name>CAC040F</resource-name>
      <local-access-point>lap01</local-access-point>
      <remote-access-point-list>rap04</remote-access-point-list>
      <remote-name>CAC040F</remote-name>
    </wtc-import>
    <wtc-import>
      <name>WTCImportedService-289-rap04</name>
      <resource-name>CAD040F</resource-name>
      <local-access-point>lap01</local-access-point>
      <remote-access-point-list>rap04</remote-access-point-list>
      <remote-name>CAD040F</remote-name>
    </wtc-import>
   <wtc-import>
      <name>WTCImportedService-290-rap04</name>
      <resource-name>CAE040F</resource-name>
      <local-access-point>lap01</local-access-point>
      <remote-access-point-list>rap04</remote-access-point-list>
      <remote-name>CAE040F</remote-name>
    </wtc-import>
    <wtc-import>
  <name>WTCImportedService-289-rap04</name>
  <resource-name>CAD040F</resource-name>
  <local-access-point>lap01</local-access-point>
  <remote-access-point-list>rap04</remote-access-point-list>
  <remote-name>CAD040F</remote-name>
</wtc-import>

Have to grep all values associated with in he file and at last if any duplicate resource name present remove the duplicated from the output file

Execpted output:

CAC040F
CAD040F
CAE040F

the resource CAD040F is a duplicate so in the expected output its just appeared once

Tried:

grep 'resource-name' test.xml | awk -F">" '{print $2}' | awk -F"<" '{print $1}' 

and this is working good..how about filtering duplicates after that?

You can do it with a single awk command

awk -F"[<>]" '/resource-name/ && !seen[$3]++ { print $3 } ' test.xml

with your sample xml file

$ awk -F"[<>]" '/resource-name/ && !seen[$3]++ { print $3 } ' test.xml
CAC040F
CAD040F
CAE040F

$

只是速度优化与@ stack0114106相比已经工作了

awk -F '[<>]' '$2 == "resource-name" && ! ( $3 in List) { print $3; List[$3] } ' test.xml

如果您已经获取了输出并且只是想删除重复项,那么最简单的方法是将输出通过管道进行排序,然后传递给uniq,因此您的命令将如下所示

grep 'resource-name' test.xml | awk -F">" '{print $2}' | awk -F"<" '{print $1}' | sort | uniq

If bash regex is your option, please try the following:

declare -A name
regex="<remote-name>([^<]+)</remote-name>"

while read -r line; do
    if [[ $line =~ $regex ]]; then
        name["${BASH_REMATCH[1]}"]=1
    fi
done < "test.xml"

for i in "${!name[@]}"; do
    echo "$i"
done

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM