简体   繁体   中英

Find the first printf format sequence in a C++ string

I search the most concise and efficient way to find the first printf format sequence (conversion specification) in a C++ string (I cannot use std::regex as they are not yet implement in most in compilers).

So the problem is to write an optimized function that will return the beginning of the first printf -format sequence pos and its length n from an input string str :

inline void detect(const std::string& str, int& pos, int& n);

For example, for:

  • %d -> pos = 0 and n = 2
  • the answer is: %05d -> pos = 15 and n = 4
  • the answer is: %% %4.2f haha -> pos = 18 and n = 5

How to do that (clever and tricky ways are welcome)?

Scan forward for % , then parse the content from there. There are some quirky ones, but not THAT bad (not sure you want to make it an inline tho').

General principle (I'm just typing as I go along, so probably not the BEST form of code ever written - and I haven't tried to compile it at all).

inline void detect(const std::string& str, int& pos, int& n)
{
    std::string::size_type last_pos = 0;
    for(;;)
    {
         last_pos = str.find('%', last_pos)
         if (last_pos == std::string::npos)
             break;    // Not found anythin. 
         if (last_pos == str.length()-1) 
             break;     // Found stray '%' at the end of the string. 
         char ch = str[last_pos+1];

         if (ch == '%')   // double percent -> escaped %. Go on for next. 
         {
             last_pos += 2;
             continue;
         }
         pos = last_pos; 
         do 
         {
             if (isdigit(ch)) || ch == '.' || ch == '-' || ch == '*' ||
                 ch == '+' || ch == 'l' || ch == 'L' || ch == 'z' || 
                 ch == 'h' || ch == 't' || ch == 'j' || ch == ' ' || 
                 ch == '#' || ch == '\'')
             {
                last_pos++;
                ch = str[last_pos+1]; 
             }
             else
             {
                 // The below string may need appending to depending on version
                 // of printf.  
                 if (string("AacdeEfFgGiopusxX").find(ch) != std::string::npos)
                 {
                     // Do something about invalid string? 
                 }
                 n = last_pos - pos; 
                 return; 
              }
         } while (last_pos < str.length()); 
     }
 }

edit2: This bit is probably better written as:

             if (isdigit(ch)) || ch == '.' || ch == '-' || ch == '*' ||
                 ch == '+' || ch == 'l' || ch == 'L' || ch == 'z' || 
                 ch == 'h' || ch == 't' || ch == 'j' || ch == ' ' || 
                 ch == '#' || ch == '\'') ... 

 if (string("0123456789.-*+lLzhtj #'").find(ch) != std::string::npos) ... 

Now, that's your homework done. please report back with what grade you get.

Edit: It should be noted that some things that a regular printf will "reject" is accepted by the above code, eg "%.......5......6f", "%5.8d", "%-5-6d" or "%-----09---5555555555555555llllld". If you want the code to reject these sort of things, it's not a huge amount of extra work, just need a little bit of logic to check "have we seen this character before" in the "check for special characters or digit", and in most cases the special character should only be allowed once. And as the comment says, I may have missed a couple of valid format specifiers. It gets further trickier if you also need to cope with "this 'l' is not allowed with 'c'" or such rules. But if the input isn't "malicious" (eg you want to annotate where on which line there are format specifiers in a working C source file), the above should work reasonably well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM