[英]How do I remove words in a file from a list of words in another .txt file using bash?
我正在嘗試使用 sed 刪除列表(a.txt 文件)中的單詞,但它無法正常工作。 output 文件刪除了它應該刪除的單詞以及它不應該刪除的部分單詞。
這是我嘗試過的代碼:
sed -E 's/('"$(tr '\n' '|' < listOfWords.txt )"')//gI' file.txt > output.txt
輸入示例:
I would love to try or hear the sample audio your app can produce. I do not want to purchase, because I've purchased so many apps that say they do something and do not deliver.
Can you please add audio samples with text you've converted? I'd love to see the end results.
Thanks!
預期 output:
would love try hear sample audio app can produce. do want purchase, because ve purchased many apps say do something do deliver.
Can please add audio samples text ve converted? d love see end results.
單詞樣本列表:
...
I
to
or
the
your
not
so
that
they
and
you
with
Thanks
...
在sed
表達式中使用\<
和\>
可防止刪除不在字邊界處開始和結束的項目。 請注意,這是一個 GNU 主義——基線 POSIX 標准sed
可能不支持它。
使用 shell 函數代替以下復制器中的文件,因此可以復制並粘貼以進行測試,而無需先創建數據文件:
getListOfWords() {
printf '%s\n' I to or the your not so that they and you with Thanks
}
getInFile() {
cat <<EOF
I would love to try or hear the sample audio your app can produce. I do not want to purchase, because I've purchased so many apps that say they do something and do not deliver.
Can you please add audio samples with text you've converted? I'd love to see the end results.
Thanks!
EOF
}
sed -E 's/\<('"$(tr '\n' '|' < <(getListOfWords) )"')\>//gI' <(getInFile)
...發射為 output:
would love try hear sample audio app can produce. do want purchase, because 've purchased many apps say do something do deliver.
Can please add audio samples text 've converted? 'd love see end results.
!
...與您的預期 output 相匹配,唯一的例外是,預期的 output 在刪除標點符號方面有一些額外的行為,而原始代碼並未嘗試實施。
有一種方法 pipe sed 命令使用sed -f-
-F24Z sed
sed 's|^|s/|; s|$|\\s*//gI|' listOfWords.txt | sed -f- file.txt > output.txt
這會將 listOfWords.txt 轉換為 sed 替換命令並將它們傳送到 sed:
s/
替換 listOfWords 中每一行的開頭\s*//gI
替換 listOfWords 中每一行的結尾s/word\s*//gI
foreach word
in listOfWordssed -f- file.txt
的替換列表,其中-
表示“stdin” 在概念上與OP對tr
的嘗試有些相似,但僅使用sed -f-
啟用的 sed
這是 Perl 替代品。 第一個參數必須是 listOfWords 文件。
perl -pe 'BEGIN {open F, shift; $w=join("|", <F>); $w=~s/\n//g;}
s/\b($w)\b\s*//g;' /tmp/listOfWords.txt /tmp/file.txt
正則表達式末尾的\s*
讓我們也刪除尾隨空格以避免多個連續空格。
output 與您的示例是:
would love try hear sample audio app can produce. do want purchase, because 've purchased many apps say do something do deliver. Can please add audio samples text 've converted? 'd love see end results. !
如果您還想刪除“I”和“you”之后的'
,可以在單詞列表文件的開頭添加I'
和you'
。
如果 GNU ed
可用/可接受。
#!/usr/bin/env bash
ed -s input.txt < <(
printf '%s\n' ',s/^/\\b/' ',s/$/\\b/' '1,$-1s/$/\\|/' '1;$j' 's/^/,s\//' 's/$/\/\/g/' '$a' ,p w . ,p Q |
ed -s listOfWords.txt
)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.