简体   繁体   中英

Sed and Awk Escaping Ampersands (&)

I'm parsing a PHP file and wrapping function prototype elements in HTML. If there are ampersands however, it breaks my code.

Input : function foo (&$var1, &$var2){...} //as String
Desired output (in HTML) : &$var1, &$var2 //basically, just output the variables so that they are properly displayed in a browser

Right now, I am sending each variable into awk's sub method , and then to sed. ,然后发送到sed。

sub(/^&/, "\\\&", param)  #param is the variable of interest (e.g. &$var1)

#Intermediate step in case it's relevant. The awk-processed elements 
#are sent to ${file}_param.txt. Each set of parameters are delimited by colons.
param=$(cut -d: -f$counter ${file}_param.txt) 

#Replace some default text in template file with real stuff.
sed -i "s|@PARAM|$param|1" "$base"_funct_def.txt

Output I'm getting: The ampersands are being interpreted. The entire match is replaced.

Isolation of issue: Doing the following instead displays 'g$var1' in the browser as I want it to. However, I'm trying to get an '&' instead.

sub(/^&/, "g", param)

My attempts: I used three backslashes because I thought awk would first process it into '\\&' which, fed into sed, would interpret '\\&' as the literal '&'. I have tried anywhere from 1 to 6 backslashes though, to no avail.

QUESTION: How can I escape the &?

Manual: http://www.staff.science.uu.nl/~oostr102/docs/nawk/nawk_92.html 手册: http//www.staff.science.uu.nl/~oostr102/docs/nawk/nawk_92.html


Some "meta"/design questions about what I'm trying to do (not needed for question!!)
I have bunch of PHP files that I'm trying to generate a bit of documentation for (structured like Javadocs). I'm going through and parsing it using REGEX and shell scripts, so that I list out the function name, parameters, and return item(s). So far, REGEX has worked out pretty well I think, but I have read a lot about how this is something that REGEX should NOT be used for. I'd welcome any comments about any of this (how is documentation usually generated?). Thanks guys!

I believe HTML reads & as the ampersand character. In your awk script you could use:

sub(/^&/, "&", param)

The dollar sign in param needs to be escaped with a backslash, eg &$var needs to be written as &\\$var, or sed and awk will try to expand $var as a variable.

Using two backslashes (ie sub(/^&/, "\\\\&", param) ) works for me. Doesn't it work for you?

It is documented in the nawk manual you referred to in your question:

As usual, to insert one backslash in the string, you must write two backslashes. Therefore, write `\\\\&' in a string constant to include a literal `&' in the replacement

Also, your sub() function is essentially replacing an ampersand with an ampersand. So maybe that's why you think it's not working even with two backslashes.

sed method:

printf "%s\n" 'function foo (&$var1, &$var2){...}//as String' | 
sed -n '/function/{s/^.*(//;s/).*$//;p}'

Output:

&$var1, &$var2

Or if HTML code is needed, pass that to a util like txt2html :

printf "%s\n" 'function foo (&$var1, &$var2){...}//as String' | 
sed -n '/function/{s/^.*(//;s/).*$//;p}' | txt2html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM