简体   繁体   English

用C清理正则表达式代码?

[英]Cleaner regular expression code in C?

I've got a C program that processes output from some shell commands. 我有一个C程序来处理一些shell命令的输出。 For 'ps', I'm interested in the following five fields. 对于“ ps”,我对以下五个领域感兴趣。 I specify the fields that I want by name, build the pattern for the glib regex engine, and then parse and process the results. 我通过名称指定了想要的字段,为glib regex引擎构建了模式,然后解析并处理结果。

Is there a good way to organize fields, patterns, and formats/types that yields readable and maintainable code? 是否有组织字段,模式和格式/类型以产生可读且可维护的代码的好方法? What I have so far works, but doesn't look so good. 到目前为止,我的作品行之有效,但看起来并不那么好。 I'm developing on OS X, but will want to port to other platforms later. 我正在OS X上进行开发,但稍后会希望移植到其他平台。

Also is there a way to get behavior like C#'s @ string operator to eliminate half my back slashes in the patterns? 还有一种方法可以使像C#的@字符串运算符这样的行为消除模式中一半的反斜杠吗?

Thanks. 谢谢。

const char field_pid[] = "pid";
const char field_lstart[] = "lstart";
const char field_ruser[] = "ruser";
const char field_cputime[] = "cputime";
const char field_command[] = "command";

char pattern[] = "\\s*(?<pid>\\d+)\\s+(?<lstart>\\w+\\s+\\w+\\s+\\d+\\s+[\\d:]+\\s+\\d+)\\s+(?<ruser>\\w+)\\s+(?<cputime>[\\d:\\.]+)\\s+(?<command>.+)";

// Do the regex match.
...

// Extract the matching strings.
gchar *pid = g_match_info_fetch_named(match_info, field_pid);
gchar *lstart = g_match_info_fetch_named(match_info, field_lstart);
gchar *ruser = g_match_info_fetch_named(match_info, field_ruser);
gchar *cputime = g_match_info_fetch_named(match_info, field_cputime);
gchar *command = g_match_info_fetch_named(match_info, field_command);

// Parse and process the strings.
...

Here are several improvement options: 以下是一些改进选项:

  • use the G_REGEX_EXTENDED option to compile the pattern. 使用G_REGEX_EXTENDED选项来编译模式。 This will make whitespace in the pattern ignored, and # can be used to introduce comments until end of line. 这将使模式中的空格被忽略,并且#可用于在行尾之前引入注释。

  • split the regex into several lines. 将正则表达式分成几行。

  • read the regex from an external file instead of picking it up from the C source. 从外部文件中读取正则表达式,而不是从C源代码中获取。 (You can write a utility function for this, or use glib's configuration reading mechanisms.) This is the only way to cure the backslashitis. (您可以为此编写实用程序功能,或使用glib的配置读取机制。)这是治愈后牙根炎的唯一方法。

Barring the last suggestion, the resulting regex might look like this: 除最后一个建议外,生成的正则表达式可能如下所示:

const char *pattern = "\
\\s*                                  \
(?<pid> \\d+ ) \\s+                   \
(?<lstart> \\w+ \\s+ \\w+ \\s+ \\d+ \\s+ [\\d:]+ \\s+ \\d+) \\s+    \
(?<ruser> \\w+) \\s+                  \
(?<cputime> [\\d:\\.]+) \\s+          \
(?<command> .+)                       \
"

Still far from perfect, but much more readable than what you started with. 距离完美还很远,但比起步时可读性高。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM