[英]Parsing an iCalendar file in C
I am looking to parse iCalendar files using C. I have an existing structure setup and reading in all ready and want to parse line by line with components. 我正在寻找使用C解析iCalendar文件的方法。我有一个现有的结构设置,并已准备就绪,并且想逐行解析组件。
For example I would need to parse something like the following: 例如,我需要解析以下内容:
UID:uid1@example.com
DTSTAMP:19970714T170000Z
ORGANIZER;CN=John Doe;SENT-BY="mailto:smith@example.com":mailto:john.doe@example.com
CATEGORIES:Project Report, XYZ, Weekly Meeting
DTSTART:19970714T170000Z
DTEND:19970715T035959Z
SUMMARY:Bastille Day Party
Here are some of the rules: 以下是一些规则:
CATEGORIES
one for example would have 3 elements in an array for the values CATEGORIES
将在数组中包含3个元素作为值 ORGANIZER
line. ORGANIZER
行上可以看到一个以上的可选参数。 There would just be another semicolon followed by the next parameter and value. I was going about this using strchr()
and strtok()
and have got some basic elements from that, however it is getting very messy and unorganized and does not seem to be the right way to do this. 我正在使用
strchr()
和strtok()
并从中获得了一些基本元素,但是它变得非常混乱且杂乱无章,似乎不是执行此操作的正确方法。
How can I implement such a complex parser with the standard C libraries (or the POSIX regex library)? 如何使用标准C库(或POSIX regex库)实现这种复杂的解析器? (not looking for whole solution, just starting point)
(不是寻找完整的解决方案,只是起点)
This answer is supposing that you want to roll your own parser using Standard C. In practice it is usually better to use an existing parser because they have already thought of and handled all the weird things that can come up. 该答案假设您想使用Standard C来滚动自己的解析器。实际上,通常最好使用现有的解析器,因为他们已经考虑并处理了所有可能出现的奇怪问题。
My high level approach would be: 我的高级方法是:
parse_line
: parse_line
:
strcspn
on the pointer to identify the location of the first :
or ;
strcspn
标识第一个的位置:
或;
(aborting if no marker found) ;
;
: extract_name_value_pair
passing address of your parsing pointer. extract_name_value_pair
传递地址。 ;
;
or :
following the entry. :
输入后。 Of course this function must handle quote marks in the value and the fact that their might be ;
;
or :
in the value :
值中 :
) :
) parse_csv
which will look for comma-separated values (again, being aware of quote marks) and store the results it finds in the right place. parse_csv
,该函数将查找逗号分隔的值(再次注意引号),并将找到的结果存储在正确的位置。 The functions parse_csv
and extract_name_value_pair
should in fact be developed and tested first. parse_csv
,应该首先开发和测试功能parse_csv
和extract_name_value_pair
。 Make a test suite and check that they work properly. 做一个测试套件,并检查它们是否正常工作。 Then write your overall parser function which calls those functions as needed.
然后编写您的整体解析器函数,并根据需要调用这些函数。
Also, write all the memory allocation code as separate functions. 另外,将所有内存分配代码编写为单独的函数。 Think of what data structure you want to store your parsed result in. Then code up that data structure, and test it, entirely independently of the parsing code.
考虑一下要存储解析结果的数据结构。然后对该数据结构进行编码,并进行测试,完全独立于解析代码。 Only then, write the parsing code and call functions to insert the resulting data in the data structure.
只有这样,才能编写解析代码和调用函数,以将结果数据插入数据结构中。
You really don't want to have memory management code mixed up with parsing code. 您确实不希望将内存管理代码与解析代码混在一起。 That makes it exponentially harder to debug.
这使得调试难度成倍增加。
When making a function that accepts a string (eg all three named functions above, plus any other helpers you decide you need) you have a few options as to their interface: 当制作一个接受字符串的函数时(例如上述所有三个命名函数,以及您认为需要的任何其他帮助器),它们的接口都有一些选择:
Each way has its pros and cons: it's annoying to write null terminators everywhere and then unwrite them later if need be; 每种方法都有其优点和缺点:烦人的是,到处都写空终止符,然后在需要时取消它们; but it's also annoying when you want to use
strcspn
or other string functions but you received a length-counted piece of string. 但是当您想使用
strcspn
或其他字符串函数但收到一段长度计数的字符串时,这也很烦人。
Also, when the function needs to let the caller know how much text it consumed in parsing, you have two options: 此外,当函数需要让调用者知道其在解析中消耗了多少文本时,您有两种选择:
There's no one right answer, with experience you will get better at deciding which option leads to the cleanest code. 没有一个正确的答案,根据经验,您会更好地决定哪种选项可以生成最干净的代码。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.