Grep and filter out values from a file

Question

I have a requirement to grep values from a xml file in shell sample file below: test.xml

<wtc-import>
      <name>WTCImportedService-288-rap04</name>
      <resource-name>CAC040F</resource-name>
      <local-access-point>lap01</local-access-point>
      <remote-access-point-list>rap04</remote-access-point-list>
      <remote-name>CAC040F</remote-name>
    </wtc-import>
    <wtc-import>
      <name>WTCImportedService-289-rap04</name>
      <resource-name>CAD040F</resource-name>
      <local-access-point>lap01</local-access-point>
      <remote-access-point-list>rap04</remote-access-point-list>
      <remote-name>CAD040F</remote-name>
    </wtc-import>
   <wtc-import>
      <name>WTCImportedService-290-rap04</name>
      <resource-name>CAE040F</resource-name>
      <local-access-point>lap01</local-access-point>
      <remote-access-point-list>rap04</remote-access-point-list>
      <remote-name>CAE040F</remote-name>
    </wtc-import>
    <wtc-import>
  <name>WTCImportedService-289-rap04</name>
  <resource-name>CAD040F</resource-name>
  <local-access-point>lap01</local-access-point>
  <remote-access-point-list>rap04</remote-access-point-list>
  <remote-name>CAD040F</remote-name>
</wtc-import>

Have to grep all values associated with in he file and at last if any duplicate resource name present remove the duplicated from the output file

Execpted output:

CAC040F
CAD040F
CAE040F

the resource CAD040F is a duplicate so in the expected output its just appeared once

Tried:

grep 'resource-name' test.xml | awk -F">" '{print $2}' | awk -F"<" '{print $1}'

and this is working good..how about filtering duplicates after that?

Answer 1

You can do it with a single awk command

awk -F"[<>]" '/resource-name/ && !seen[$3]++ { print $3 } ' test.xml

with your sample xml file

$ awk -F"[<>]" '/resource-name/ && !seen[$3]++ { print $3 } ' test.xml
CAC040F
CAD040F
CAE040F

$

Answer 2

只是速度优化与@ stack0114106相比已经工作了

awk -F '[<>]' '$2 == "resource-name" && ! ( $3 in List) { print $3; List[$3] } ' test.xml

Answer 3

如果您已经获取了输出并且只是想删除重复项，那么最简单的方法是将输出通过管道进行排序，然后传递给uniq，因此您的命令将如下所示

grep 'resource-name' test.xml | awk -F">" '{print $2}' | awk -F"<" '{print $1}' | sort | uniq

Answer 4

If bash regex is your option, please try the following:

declare -A name
regex="<remote-name>([^<]+)</remote-name>"

while read -r line; do
    if [[ $line =~ $regex ]]; then
        name["${BASH_REMATCH[1]}"]=1
    fi
done < "test.xml"

for i in "${!name[@]}"; do
    echo "$i"
done

Grep and filter out values from a file

Question

4 answers

solution1
1 2019-03-05 04:21:46

solution2
1 2019-03-05 15:59:28

solution3
0 2019-03-04 23:22:45

solution4
0 2019-03-04 23:33:05

Grep and filter out values from a file

Question

4 answers

solution1 1 2019-03-05 04:21:46

solution2 1 2019-03-05 15:59:28

solution3 0 2019-03-04 23:22:45

solution4 0 2019-03-04 23:33:05

solution1
1 2019-03-05 04:21:46

solution2
1 2019-03-05 15:59:28

solution3
0 2019-03-04 23:22:45

solution4
0 2019-03-04 23:33:05