简体   繁体   中英

Escaping double quotation marks in sed

Creating a search and replace function for my application, I am running a test scenario with 3 files, array tscript test

I am trying to escape double quotation marks but it wont work

script file contains

variableName=$1
sed "s#data\-field\=\"${variableName}\.name\"#data\-field\=${variableName}\.name data\-type\=dropdown data\-dropdown\-type\=${variableName}#g" test

test file contains

data-field=“fee_category.name”
data-field=“tax_type.name”

array file contains

fee_category
tax_type

There is no error code, the output is just what I inputted because the sed command could not find what it was looking for, if I dont use double quotes next to ${VariableName} and remove them from the test file the function works fine.

In case of doubt, you can try to wildcard them:

variableName="fee_category"
sed "s#data-field=.${variableName}\.name.#& data-type=dropdown data-dropdown-type=${variableName}#g" test

# Or, when you do not want those quotes back in your output
sed "s#\(data-field=\).\(${variableName}\)\(\.name\).#\1\2\3 data-type=dropdown data-dropdown-type=\2#g" test

Following the comment of mklement0 , i am only writing this answer in order to share some of my findings in case we need a literal match of your special double quotes. It might be useful to other users.

Your quoted text fee_category.name has Unicode Left Double Quotation Mark U+201c quotes on the left side and Unicode Right Double Quotation Mark U+201d on the right side.

Those non std quotation marks have also some representation in UTF-8 :

Unicode Left Double Quotation Mark U+201c
UTF-8 (hex) 0xE2 0x80 0x9C (e2809c)
UTF-16 (hex) 0x201C (201c)

Unicode Right Double Quotation Mark U+201d
UTF-8 (hex) 0xE2 0x80 0x9D (e2809d)
UTF-16 (hex) 0x201D (201d)

Analyzing your file with od utility we can confirm presence of above hex utf-8 sequences in your data:

$ echo data-field=“fee_category.name” |od -w40 -t x1c
0000000  64  61  74  61  2d  66  69  65  6c  64  3d  e2  80  9c  66  65  65  5f  63  61  74  65  67  6f  72  79  2e  6e  61  6d  65  e2  80  9d  0a
          d   a   t   a   -   f   i   e   l   d   = 342 200 234   f   e   e   _   c   a   t   e   g   o   r   y   .   n   a   m   e 342 200 235  \n

What is interesting is that we can print those unicode characters in bash either by using their unicode code or by using the utf-8 hex series :

$ echo -e "\u201c test \u201d"
“ test ”
$ echo -e "\xe2\x80\x9c test \xe2\x80\x9d"
“ test ”

Accordingly we can force sed to match those special chars like this:

$ string=$(echo -e "\u201c test \u201d");echo "$string"
“ test ”
$ lq=$(echo -ne "\u201c");rq=$(echo -ne "\u201d")
$ sed -E "s/($lq)(.+)($rq)/**\2**/" <<<"$string"
** test **

Also this seems to work fine, without the need of using "helper" variables:

$ sed -E "s/(\xe2\x80\x9c)(.+)(\xe2\x80\x9d)/**\2**/" <<<"$string"
** test **

Meaning that the hex sequence \\xe2\\x80\\x9c (or \\xe2\\x80\\x9d for right quotes) can be used directly by sed to provide a literal match on this special quotes.

You might as well make a pre-process of your files and convert all those non standard quotes to standard quotes using something like :

$ sed -E "s/[\xe2\x80\x9c,\xe2\x80\x9d]/\x22/g" <<<"$string"
" test "   #Special quotes replaced with classic ascii quotes.

Above test have been done in Debian Testing & Bash 4.4 & GNU Sed 4.4 and may be this techniques will not work in other sed flavors.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM