簡體   English   中英

如何使用 bash 從 another.txt 文件中的單詞列表中刪除文件中的單詞?

[英]How do I remove words in a file from a list of words in another .txt file using bash?

我正在嘗試使用 sed 刪除列表(a.txt 文件)中的單詞,但它無法正常工作。 output 文件刪除了它應該刪除的單詞以及它不應該刪除的部分單詞。

這是我嘗試過的代碼:

sed -E 's/('"$(tr '\n' '|' < listOfWords.txt )"')//gI' file.txt > output.txt

輸入示例:

I would love to try or hear the sample audio your app can produce. I do not want to purchase, because I've purchased so many apps that say they do something and do not deliver.  

Can you please add audio samples with text you've converted? I'd love to see the end results.

Thanks!

預期 output:

would love try hear sample audio app can produce. do want purchase, because ve purchased many apps say do something do deliver.  

Can please add audio samples text ve converted? d love see end results.

單詞樣本列表:

...
I
to
or
the
your
not
so
that
they
and
you
with
Thanks
...

sed表達式中使用\<\>可防止刪除不在字邊界處開始和結束的項目。 請注意,這是一個 GNU 主義——基線 POSIX 標准sed可能不支持它。

使用 shell 函數代替以下復制器中的文件,因此可以復制並粘貼以進行測試,而無需先創建數據文件:

getListOfWords() {
  printf '%s\n' I to or the your not so that they and you with Thanks
}

getInFile() {
  cat <<EOF
I would love to try or hear the sample audio your app can produce. I do not want to purchase, because I've purchased so many apps that say they do something and do not deliver.  

Can you please add audio samples with text you've converted? I'd love to see the end results.

Thanks!
EOF
}

sed -E 's/\<('"$(tr '\n' '|' < <(getListOfWords) )"')\>//gI' <(getInFile)

...發射為 output:

 would love  try  hear  sample audio  app can produce.  do  want  purchase, because 've purchased  many apps  say  do something  do  deliver.  

Can  please add audio samples  text 've converted? 'd love  see  end results.

!

...與您的預期 output 相匹配,唯一的例外是,預期的 output 在刪除標點符號方面有一些額外的行為,而原始代碼並未嘗試實施。

有一種方法 pipe sed 命令使用sed -f- -F24Z sed

sed 's|^|s/|; s|$|\\s*//gI|' listOfWords.txt | sed -f- file.txt > output.txt

這會將 listOfWords.txt 轉換為 sed 替換命令並將它們傳送到 sed:

  • s/替換 listOfWords 中每一行的開頭
  • \s*//gI替換 listOfWords 中每一行的結尾
  • 這導致s/word\s*//gI foreach word in listOfWords
  • pipe sed -f- file.txt的替換列表,其中-表示“stdin”

在概念上與OP對tr的嘗試有些相似,但僅使用sed -f-啟用的 sed

這是 Perl 替代品。 第一個參數必須是 listOfWords 文件。

perl -pe 'BEGIN {open F, shift; $w=join("|", <F>); $w=~s/\n//g;}
          s/\b($w)\b\s*//g;'  /tmp/listOfWords.txt  /tmp/file.txt

正則表達式末尾的\s*讓我們也刪除尾隨空格以避免多個連續空格。

output 與您的示例是:

 would love try hear sample audio app can produce. do want purchase, because 've purchased many apps say do something do deliver. Can please add audio samples text 've converted? 'd love see end results. !

如果您還想刪除“I”和“you”之后的' ,可以在單詞列表文件的開頭添加I'you'

如果 GNU ed可用/可接受。

#!/usr/bin/env bash

ed -s input.txt < <(
  printf '%s\n' ',s/^/\\b/' ',s/$/\\b/' '1,$-1s/$/\\|/' '1;$j' 's/^/,s\//' 's/$/\/\/g/' '$a' ,p w . ,p Q  |
  ed -s listOfWords.txt
)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM