简体   繁体   中英

Parsing custom format file in C#

I need to parse a custom file format with C#. The file format is a PBX file of Xcode project. There is no official documentation on the format. But it's rather straightforward. Here is the simple example:

// !$*UTF8*$!
{
    archiveVersion = 1;
    classes = {
    };
    objectVersion = 46;
    objects = {

        /* Begin PBXBuildFile section */
        5143B90C1884374800F27FD8 /* Foundation.framework in Frameworks */ = {isa = PBXBuildFile; fileRef = 5143B90B1884374800F27FD8 /* Foundation.framework */; };
        5143B90E1884374800F27FD8 /* CoreGraphics.framework in Frameworks */ = {isa = PBXBuildFile; fileRef = 5143B90D1884374800F27FD8 /* CoreGraphics.framework */; };
        5143B9101884374800F27FD8 /* UIKit.framework in Frameworks */ = {isa = PBXBuildFile; fileRef = 5143B90F1884374800F27FD8 /* UIKit.framework */; };
        /* End PBXBuildFile section */
    };
    rootObject = 5143B9001884374800F27FD8 /* Project object */;
}

In objects section there is a sequence of object definitions: object unique id followed by its properties. You can see comments here. Also property values can be enclosed in quotes.

The complete example of PBX file is here .

Now I need to build DOM of the file. What is the best approach to solve this kind of tasks?

Using parser (because of nested braces regex is no-go). Pick the one you feel OK with syntax:

I guess you are new to this, so this is why I grouped those -- top down approach, bottom up and combinator one. My personal preference is bottom up, the definition of mathematical expressions feels more natural for me, but here you should not have that kind of problem.

Starting 2014-01-28 NLT includes PBXProj files simple reader.

I've found that Sprache project is really good for this type of grammars.

For simple parsing cases Regex's can be enough too.

我在适当的时候使用Regex类,但是对于像你在这​​里展示的更多结构化数据,我将转向这里为C#记录的 ANTLR。

If you need to be able to match nested braces, regexes will not work. You could use a parser generator like ANTLR, but this format looks simple enough to write your own recursive descent parser.

Before we could show you how to write the parser we would need to know what kind of DOM you want to output.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM