简体   繁体   English

用sed awk或grep替换单词

[英]Substituting words with sed awk or grep

I want to replace certain words in text files. 我想替换文本文件中的某些单词。 Specifically English spelling of words with the American spelling. 特别是单词的英语拼写与美国拼写。 I have two arrays of the spellings which are ordered the same ie 我有两个相同的拼写排列顺序,即

list_1=['cosy', 'carat', 'cheque'] list_1 = [“舒适”,“克拉”,“支票”]
list_2=['cozy', 'karat', 'check'] list_2 = ['cozy','karat','check']

Am I able to search a text file for elements in list_1 and substitute with respect to the elements in list_2? 我是否可以在文本文件中搜索list_1中的元素并替换为list_2中的元素?

This approach assumes that you have access to GNU sed . 这种方法假定您有权访问GNU sed

The first thing to do is get the information out of those lists using this script: 首先要做的是使用此脚本从这些列表中获取信息:

$ cat script.sh
list_1=('cosy' 'carat' 'cheque')
list_2=('cozy' 'karat' 'check')
for i in "${!list_1[@]}"
do
    echo "s/\\b${list_1[i]}\\b/${list_2[i]}/g"
done >spelling.sed

This produces the file: 产生文件:

$ cat spelling.sed 
s/\bcosy\b/cozy/g
s/\bcarat\b/karat/g
s/\bcheque\b/check/g

Now, we can use that file to change spellings. 现在,我们可以使用该文件来更改拼写。 For example: 例如:

$ echo "Decosy makes a cosy cheque." | sed -f spelling.sed
Decosy makes a cozy check.

Note that the spelling of Decosy is not changed. 请注意, Decosy的拼写不会更改。 This is because of the use of the GNU extension \\b which denotes word boundaries. 这是因为使用了GNU扩展名\\b ,它表示单词边界。 In this way, only whole words are changed. 这样,仅整个单词被改变。

Here is an awk script that do the task in one file scan. 这是一个可以在一个文件扫描中执行任务的awk脚本。

script.awk

BEGIN {
    patsplit(list1, arr1, /[[:alpha:]]+/);  # read array of word from list1
    patsplit(list2, arr2, /[[:alpha:]]+/);  # read array of word from list2
}
{                                    
    for (i in arr1) gsub(arr1[i], arr2[i]); # for each line, replace all words in arrays
}
1

execution: 执行:

 list_1=['cosy', 'carat', 'cheque']
 list_2=['cozy', 'karat', 'check'] 
 awk -v list1=$list_1 -v list2=$list_2 -f script.awk input.txt

Note this solution is not considering capitalized words. 请注意,此解决方案不考虑大写单词。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM