简体   繁体   中英

Substituting words with sed awk or grep

I want to replace certain words in text files. Specifically English spelling of words with the American spelling. I have two arrays of the spellings which are ordered the same ie

list_1=['cosy', 'carat', 'cheque']
list_2=['cozy', 'karat', 'check']

Am I able to search a text file for elements in list_1 and substitute with respect to the elements in list_2?

This approach assumes that you have access to GNU sed .

The first thing to do is get the information out of those lists using this script:

$ cat script.sh
list_1=('cosy' 'carat' 'cheque')
list_2=('cozy' 'karat' 'check')
for i in "${!list_1[@]}"
do
    echo "s/\\b${list_1[i]}\\b/${list_2[i]}/g"
done >spelling.sed

This produces the file:

$ cat spelling.sed 
s/\bcosy\b/cozy/g
s/\bcarat\b/karat/g
s/\bcheque\b/check/g

Now, we can use that file to change spellings. For example:

$ echo "Decosy makes a cosy cheque." | sed -f spelling.sed
Decosy makes a cozy check.

Note that the spelling of Decosy is not changed. This is because of the use of the GNU extension \\b which denotes word boundaries. In this way, only whole words are changed.

Here is an awk script that do the task in one file scan.

script.awk

BEGIN {
    patsplit(list1, arr1, /[[:alpha:]]+/);  # read array of word from list1
    patsplit(list2, arr2, /[[:alpha:]]+/);  # read array of word from list2
}
{                                    
    for (i in arr1) gsub(arr1[i], arr2[i]); # for each line, replace all words in arrays
}
1

execution:

 list_1=['cosy', 'carat', 'cheque']
 list_2=['cozy', 'karat', 'check'] 
 awk -v list1=$list_1 -v list2=$list_2 -f script.awk input.txt

Note this solution is not considering capitalized words.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM