简体   繁体   中英

awk keep field separator when using sub

I'm trying to "obfuscate" a javascript code (make it unreadable to avoid piracy) I'm using awk to do this. That's working fine for long words but not for single char words.

Input text :

var t=document.getElementById(u)

Expected output :

var b7=document.getElementById(b8)

Real output :

var b7 document getElementById b8

Awk code :

${cor_var} is a unix variable which contains "t" in our example
${obf_var} is a unix variable which contains "b7" (the obfuscated variable)

awk -v AWK_COR_VAR="${cor_var}" -v AWK_OBF_VAR="${obf_var}" '

      # We use Non-word characters as field separator
      # Like this we can extract var/func
      BEGIN {FS="[^A-Za-z0-9_]+"}
      {
        if ($0 ~ AWK_COR_VAR) {
          # On a line containing our word, we go through each field till we find our word
          # and then we replace it with sub
          for ( x = 1; x < NF; x++ ) {
            # Output fields with space as delimiter
            if ($x == AWK_COR_VAR) {sub($x, AWK_OBF_VAR, $x)};
          }
          print $0;
        } else {print $0}
      }' $file

It seems that the sub functions got rid of the field separator. I also tried sub without the 3rd arg, which keeps the field separators, but also changes 't' where it shouldn't:

if ($x == AWK_COR_VAR) {sub($x, AWK_OBF_VAR)};

output :

b7=documenb7.getElementById(t)

sub isn't getting rid of your field separator. What's happening is:

  1. awk inherently discards the field separator text when it parses each line into fields $1 .. $NF . $0 is initially left as the original line text.
  2. The moment you reassign one of the fields (eg $1 ), awk regenerates $0 to be the concatenation of all the fields, separated by OFS , the output field separator. OFS is a space, by default.

So, when you print $0 , you have two cases: (1) You didn't modify any of the fields, and so you're seeing the original complete line. (2) You did modify a field, and so you're seeing a line stripped of any punctuation.

If you continue down this path, what you'll need to do is preserve the original punctuation. This means not using FS to do tokenization. You'll need to do something more like iteratively scanning for word boundaries, detecting your trigger tokens, and building up a result line as you go. Or something like that.

But beware! You will also need to be aware that if you aren't sophisticated enough, you run the risk of finding your variable names inside quoted strings ( "I want a t-shirt." ) and as Javascript property names ( blort = foo.t.bar ).

My real recommendation is to just use one of the several existing Javascript obfuscators. Google's Closure https://developers.google.com/closure/ , which is a package of tools that includes obfuscation, is a pretty good choice.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM