簡體   English   中英

Grep並從文件中過濾出值

[英]Grep and filter out values from a file

我需要從下面的shell示例文件中的xml文件中grep值進行測試:test.xml

<wtc-import>
      <name>WTCImportedService-288-rap04</name>
      <resource-name>CAC040F</resource-name>
      <local-access-point>lap01</local-access-point>
      <remote-access-point-list>rap04</remote-access-point-list>
      <remote-name>CAC040F</remote-name>
    </wtc-import>
    <wtc-import>
      <name>WTCImportedService-289-rap04</name>
      <resource-name>CAD040F</resource-name>
      <local-access-point>lap01</local-access-point>
      <remote-access-point-list>rap04</remote-access-point-list>
      <remote-name>CAD040F</remote-name>
    </wtc-import>
   <wtc-import>
      <name>WTCImportedService-290-rap04</name>
      <resource-name>CAE040F</resource-name>
      <local-access-point>lap01</local-access-point>
      <remote-access-point-list>rap04</remote-access-point-list>
      <remote-name>CAE040F</remote-name>
    </wtc-import>
    <wtc-import>
  <name>WTCImportedService-289-rap04</name>
  <resource-name>CAD040F</resource-name>
  <local-access-point>lap01</local-access-point>
  <remote-access-point-list>rap04</remote-access-point-list>
  <remote-name>CAD040F</remote-name>
</wtc-import>

必須grep文件中與之關聯的所有值,最后,如果存在任何重復的資源名稱,則從輸出文件中刪除重復的名稱

輸出結果:

CAC040F
CAD040F
CAE040F

資源CAD040F是重復的,因此在預期輸出中它僅出現一次

嘗試過:

grep 'resource-name' test.xml | awk -F">" '{print $2}' | awk -F"<" '{print $1}' 

這工作良好..那之后如何過濾重復項呢?

您可以使用單個awk命令來完成此操作

awk -F"[<>]" '/resource-name/ && !seen[$3]++ { print $3 } ' test.xml

與您的示例XML文件

$ awk -F"[<>]" '/resource-name/ && !seen[$3]++ { print $3 } ' test.xml
CAC040F
CAD040F
CAE040F

$

只是速度優化與@ stack0114106相比已經工作了

awk -F '[<>]' '$2 == "resource-name" && ! ( $3 in List) { print $3; List[$3] } ' test.xml

如果您已經獲取了輸出並且只是想刪除重復項,那么最簡單的方法是將輸出通過管道進行排序,然后傳遞給uniq,因此您的命令將如下所示

grep 'resource-name' test.xml | awk -F">" '{print $2}' | awk -F"<" '{print $1}' | sort | uniq

如果您選擇bash regex,請嘗試以下操作:

declare -A name
regex="<remote-name>([^<]+)</remote-name>"

while read -r line; do
    if [[ $line =~ $regex ]]; then
        name["${BASH_REMATCH[1]}"]=1
    fi
done < "test.xml"

for i in "${!name[@]}"; do
    echo "$i"
done

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM