简体   繁体   English

在C ++字符串中找到第一个printf格式序列

[英]Find the first printf format sequence in a C++ string

I search the most concise and efficient way to find the first printf format sequence (conversion specification) in a C++ string (I cannot use std::regex as they are not yet implement in most in compilers). 我搜索了一种最简洁有效的方法来查找C ++字符串中的第一个printf格式序列(转换规范)(我不能使用std::regex因为它们尚未在大多数编译器中实现)。

So the problem is to write an optimized function that will return the beginning of the first printf -format sequence pos and its length n from an input string str : 因此,问题在于编写一个优化函数,该printf将从输入字符串str返回第一个printf格式序列pos的开头及其长度n

inline void detect(const std::string& str, int& pos, int& n);

For example, for: 例如,用于:

  • %d -> pos = 0 and n = 2 %d > pos = 0n = 2
  • the answer is: %05d -> pos = 15 and n = 4 the answer is: %05d > pos = 15n = 4
  • the answer is: %% %4.2f haha -> pos = 18 and n = 5 the answer is: %% %4.2f haha > pos = 18n = 5

How to do that (clever and tricky ways are welcome)? 怎么做(欢迎聪明又狡猾的方式)?

Scan forward for % , then parse the content from there. 向前扫描% ,然后从那里解析内容。 There are some quirky ones, but not THAT bad (not sure you want to make it an inline tho'). 有一些古怪的东西,但是还不错(不确定您要使其inline吗?)。

General principle (I'm just typing as I go along, so probably not the BEST form of code ever written - and I haven't tried to compile it at all). 一般原则(我只是随便输入内容,因此可能不是有史以来最好的代码形式,而且我也没有尝试过编译)。

inline void detect(const std::string& str, int& pos, int& n)
{
    std::string::size_type last_pos = 0;
    for(;;)
    {
         last_pos = str.find('%', last_pos)
         if (last_pos == std::string::npos)
             break;    // Not found anythin. 
         if (last_pos == str.length()-1) 
             break;     // Found stray '%' at the end of the string. 
         char ch = str[last_pos+1];

         if (ch == '%')   // double percent -> escaped %. Go on for next. 
         {
             last_pos += 2;
             continue;
         }
         pos = last_pos; 
         do 
         {
             if (isdigit(ch)) || ch == '.' || ch == '-' || ch == '*' ||
                 ch == '+' || ch == 'l' || ch == 'L' || ch == 'z' || 
                 ch == 'h' || ch == 't' || ch == 'j' || ch == ' ' || 
                 ch == '#' || ch == '\'')
             {
                last_pos++;
                ch = str[last_pos+1]; 
             }
             else
             {
                 // The below string may need appending to depending on version
                 // of printf.  
                 if (string("AacdeEfFgGiopusxX").find(ch) != std::string::npos)
                 {
                     // Do something about invalid string? 
                 }
                 n = last_pos - pos; 
                 return; 
              }
         } while (last_pos < str.length()); 
     }
 }

edit2: This bit is probably better written as: edit2:这一点最好写成:

             if (isdigit(ch)) || ch == '.' || ch == '-' || ch == '*' ||
                 ch == '+' || ch == 'l' || ch == 'L' || ch == 'z' || 
                 ch == 'h' || ch == 't' || ch == 'j' || ch == ' ' || 
                 ch == '#' || ch == '\'') ... 

 if (string("0123456789.-*+lLzhtj #'").find(ch) != std::string::npos) ... 

Now, that's your homework done. 现在,这就是您的作业。 please report back with what grade you get. 请报告您获得的年级。

Edit: It should be noted that some things that a regular printf will "reject" is accepted by the above code, eg "%.......5......6f", "%5.8d", "%-5-6d" or "%-----09---5555555555555555llllld". 编辑:应该注意的是,上面的代码接受了常规printf将“拒绝”的某些事情,例如“%....... 5 ...... 6f”,“%5.8d”, “%-5-6d”或“%----- 09 --- 5555555555555555555llllld”。 If you want the code to reject these sort of things, it's not a huge amount of extra work, just need a little bit of logic to check "have we seen this character before" in the "check for special characters or digit", and in most cases the special character should only be allowed once. 如果您想让代码拒绝此类事情,那么这并不是很多额外的工作,只需要一点点逻辑来检查“检查特殊字符或数字”中的“我们以前看过此字符”,然后在大多数情况下,特殊字符只能被允许一次。 And as the comment says, I may have missed a couple of valid format specifiers. 正如评论所言,我可能错过了几个有效的格式说明符。 It gets further trickier if you also need to cope with "this 'l' is not allowed with 'c'" or such rules. 如果您还需要处理“'c'不允许使用此'l'”或此类规则,则将变得更加棘手。 But if the input isn't "malicious" (eg you want to annotate where on which line there are format specifiers in a working C source file), the above should work reasonably well. 但是,如果输入的内容不是“恶意的”(​​例如,您要注释有效的C源文件中的哪一行存在格式说明符),则上述内容应该可以正常工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM