從管道輸出中刪除空格

Question

在文本文件中，我有一些標記為:foo標簽。 為了獲得文件中標簽的概述，我想獲得所有這些標簽的清單。

這是通過

grep -o -e ":[a-z]*\( \|$\)" file.txt | sort |  uniq

現在，由於末尾有空格或換行符，我得到了重復項。

:movie  <-- only newline
:movie  <-- whitespace and newline
:read
:read

我想避免重復。 但是我不知道怎么做。 我嘗試過| tr -d '[:space:]' | tr -d '[:space:]' ，但這只會導致所有管道輸出的串聯...

file.txt的示例

Avengers: Infinity War :movie
Yojimbo 1961 :movie nippon

Answer 1

一些測試行（第一個:space space后面有一個:space ，如果用鼠標突出顯示數據，您可以看到它）：

$ cat file
with :space 
with :space too
without :space
test: this

使用grep ， sort和uniq ：

$ grep -o ":[a-z]\+" file | sort | uniq 
:space

使用awk（至少，awk和mawk）：

$ awk 'BEGIN{RS="[" FS "|" RS "]+"}/:[a-z]/&&!a[$0]++' file
:space

每個單詞都是其自己的記錄，我們選擇每個以冒號開頭的單詞的第一個實例。 RS="[" FS "|" RS "]+" 可以用其他方式寫出RS="[" FS "|" RS "]+" ，但是以這種形式強調FS和RS任何組合。

Answer 2

您可以使用Perl正則表達式和單詞匹配：

grep -oP ':\w+' file.txt | sort |  uniq

或者，只匹配非空格字符：

grep -o ':[^ ]*' file.txt | sort |  uniq

Answer 3

由於您尚未提供示例Input_file，因此無法進行測試，因為我也沒有zsh。 嘗試關注並讓我知道這是否對您有幫助。

awk '/:[a-z]*/{sub(/ +$/,"");} !a[$0]++' Input_file | sort

Answer 4

您可以嘗試使用sed

sed 's/.*\(:[a-z]*\).*/\1/' file.txt | sort | uniq