简体   繁体   中英

How can I parse C switch-case and for Statement in AWK?

How to parse switch-case statement like below with awk? I want to create simple C syntax checker with awk. This checker must read the code and return whether there is syntax error or not. If there is, awk should print what error on it.

switch(number)
{
    case 1  : number = 'a'; break;
    case 2  : number = 'b'; break;
    default : number = 'x'; 
}

And for for() statement, like this:

for(i=0;i<10;i++) 
    {
        number = 'A';
    }

My current code for switch-case statement was:

#parser_switchcase.awk
{
for(i=1; i<=NF; i++)
{
  if($i~/switch\([[:alnum:]]+\)/)
    print("switch(VALID_VARIABLE)")
}

}

result for my first C switch-case code above:

master@master:~/Dokumen/Root$ awk -f parser_switchcase.awk soalswitch 
switch(VALID_VARIABLE)

but really it needs many improvements. It is not complete.

I need awk suggestion just for reading and checking exactly code examples I have typed above. Exactly, so I just need awk parsing code for those, not the outside possibility such as additional function, additional code, only what mentioned on the codes above.

Using awk for C syntax checks is a brave project. Have fun!

I would use gcc for syntax checks.Try this:

gcc -fsyntax-only test.c

As others have suggested, awk is not the right tool for this job...
But if you can guarantee that your code conforms to a fairly rigid and exact structure, such as the one presented in the question, you could write a very basic awk interpreter for it.

For example:

BEGIN {
  ERROR = "ERROR: ";
  WARNING = "WARNING: ";
}

# Start of switch statment
/switch/ { 
  # Cursory check for valid variable name: must start with a letter or underscore,
  # and be composed of alphanumeric characters or underscores.
  if ($0 !~ /switch\([A-Za-z_]+[A-Za-z0-9_]*\)/)
    print ERROR "switch statement '" $0 "' has a syntax error.";

  switch_stmnt = 1;
  next;
}

# Start of for statement
/for/ {
  # For loop can have lots of various stuff between parentheses, so hard to check.
  # But, if you know it will always be `(i=0;i<10;i++)`, then it's much easier to 
  # create a rule.
  if ($0 !~ /for\(.*;.*;.*\)/)
    print ERROR "for statement '" $0 "' has a syntax error.";

  for_stmnt = 1;
  next;
}

# Start of case statement
/case/ { 
  # Check if in switch
  if (! switch_stmnt)
    print ERROR "case statement '" $0 "' outside of switch statement.";

  # Already in a case statement
  if (case_stmnt)
    print WARNING "case statement fall-through.";

  # Check syntax
  if ($2 !~ /[A-Za-z0-9_]+/ || $3 != ":")
    print ERROR "case statement '" $0 "' has a syntax error.";

  case_stmnt = 1;
}

# Default
/default/ {
  # Check if in switch
  if (! switch_stmnt)
    print ERROR "default statement '" $0 "' outside of switch statement.";

  # Already in a case statement
  if (case_stmnt)
    print WARNING "case statement fall-through.";
}

# Break
/break;/ {
  if (case_stmnt) { case_stmnt = 0; }
  else if (for_stmnt) { }
  else { print ERROR "'break' outside of case statement or for loop."; }
}

# Start of control structure
/{/ { ++brace; }

# End of control structure
/}/ {
  if (switch_stmnt) {
    switch_stmnt = 0;
    case_stmnt = 0;
  }
  else if (for_stmnt)
    for_stmnt = 0;

  if (brace == 0)
    print ERROR "Extra closing brace '}' with no matching open brace.";

  --brace;
}

{
  # Do syntax checking on regular lines, eg. "number = 'a';"
  next;
}

END {
  if (switch_stmnt || for_stmnt || brace)
    print ERROR "Unterminated for or switch statement at end of file.";
}

This checks that a few statements conform to a few rules. You can expand this with a lot more regex rules and flags. Especially difficult would be plain statements without keywords, since these could be declarations, assignments, function calls, etc. BUT, if you will only be making assignments such as number = 'a'; as above, then it's also not too hard to match these lines (something like /[A-Za-z_]+[A-Za-z0-9_]* = '.'/ )

Mix if before switch conditional.

# ----------------------------------------------

A=strtonum(ARGV[2]);

 if (A >=  0 && A<  10 ) T=1;
 if (A >= 11 && A<  20 ) T=2;
 if (A >= 21                   ) T=3;

switch (T) {
     case 1: print "1\n"
                   break
     case 2: print "2\n"
                   break
     case 3: print "3\n"
                   break
    default: print "0\n"
                   break
   }
# --------------------------------

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM