[英]Improving the readability of a FParsec parser
I have a hand-written CSS parser done in C# which is getting unmanageable and was trying to do it i FParsec to make it more mantainable. 我有一个在C#中完成的手写CSS解析器,它变得难以管理,并且正在尝试使用FParsec来使其更具可持续性。 Here's a snippet that parses a css selector element made with regexes: 这是一个解析用regexes制作的css选择器元素的片段:
var tagRegex = @"(?<Tag>(?:[a-zA-Z][_\-0-9a-zA-Z]*|\*))";
var idRegex = @"(?:#(?<Id>[a-zA-Z][_\-0-9a-zA-Z]*))";
var classesRegex = @"(?<Classes>(?:\.[a-zA-Z][_\-0-9a-zA-Z]*)+)";
var pseudoClassRegex = @"(?::(?<PseudoClass>link|visited|hover|active|before|after|first-line|first-letter))";
var selectorRegex = new Regex("(?:(?:" + tagRegex + "?" + idRegex + ")|" +
"(?:" + tagRegex + "?" + classesRegex + ")|" +
tagRegex + ")" +
pseudoClassRegex + "?");
var m = selectorRegex.Match(str);
if (m.Length != str.Length) {
cssParserTraceSwitch.WriteLine("Unrecognized selector: " + str);
return null;
}
string tagName = m.Groups["Tag"].Value;
string pseudoClassString = m.Groups["PseudoClass"].Value;
CssPseudoClass pseudoClass;
if (pseudoClassString.IsEmpty()) {
pseudoClass = CssPseudoClass.None;
} else {
switch (pseudoClassString.ToLower()) {
case "link":
pseudoClass = CssPseudoClass.Link;
break;
case "visited":
pseudoClass = CssPseudoClass.Visited;
break;
case "hover":
pseudoClass = CssPseudoClass.Hover;
break;
case "active":
pseudoClass = CssPseudoClass.Active;
break;
case "before":
pseudoClass = CssPseudoClass.Before;
break;
case "after":
pseudoClass = CssPseudoClass.After;
break;
case "first-line":
pseudoClass = CssPseudoClass.FirstLine;
break;
case "first-letter":
pseudoClass = CssPseudoClass.FirstLetter;
break;
default:
cssParserTraceSwitch.WriteLine("Unrecognized selector: " + str);
return null;
}
}
string cssClassesString = m.Groups["Classes"].Value;
string[] cssClasses = cssClassesString.IsEmpty() ? EmptyArray<string>.Instance : cssClassesString.Substring(1).Split('.');
allCssClasses.AddRange(cssClasses);
return new CssSelectorElement(
tagName.ToLower(),
cssClasses,
m.Groups["Id"].Value,
pseudoClass);
My first attempt yielded this: 我的第一次尝试产生了这个:
type CssPseudoClass =
| None = 0
| Link = 1
| Visited = 2
| Hover = 3
| Active = 4
| Before = 5
| After = 6
| FirstLine = 7
| FirstLetter = 8
type CssSelectorElement =
{ Tag : string
Id : string
Classes : string list
PseudoClass : CssPseudoClass }
with
static member Default =
{ Tag = "";
Id = "";
Classes = [];
PseudoClass = CssPseudoClass.None; }
open FParsec
let ws = spaces
let str = skipString
let strWithResult str result = skipString str >>. preturn result
let identifier =
let isIdentifierFirstChar c = isLetter c || c = '-'
let isIdentifierChar c = isLetter c || isDigit c || c = '_' || c = '-'
optional (str "-") >>. many1Satisfy2L isIdentifierFirstChar isIdentifierChar "identifier"
let stringFromOptional strOption =
match strOption with
| Some(str) -> str
| None -> ""
let pseudoClassFromOptional pseudoClassOption =
match pseudoClassOption with
| Some(pseudoClassOption) -> pseudoClassOption
| None -> CssPseudoClass.None
let parseCssSelectorElement =
let tag = identifier <?> "tagName"
let id = str "#" >>. identifier <?> "#id"
let classes = many1 (str "." >>. identifier) <?> ".className"
let parseCssPseudoClass =
choiceL [ strWithResult "link" CssPseudoClass.Link;
strWithResult "visited" CssPseudoClass.Visited;
strWithResult "hover" CssPseudoClass.Hover;
strWithResult "active" CssPseudoClass.Active;
strWithResult "before" CssPseudoClass.Before;
strWithResult "after" CssPseudoClass.After;
strWithResult "first-line" CssPseudoClass.FirstLine;
strWithResult "first-letter" CssPseudoClass.FirstLetter]
"pseudo-class"
// (tag?id|tag?classes|tag)pseudoClass?
pipe2 ((pipe2 (opt tag)
id
(fun tag id ->
{ CssSelectorElement.Default with
Tag = stringFromOptional tag;
Id = id })) |> attempt
<|>
(pipe2 (opt tag)
classes
(fun tag classes ->
{ CssSelectorElement.Default with
Tag = stringFromOptional tag;
Classes = classes })) |> attempt
<|>
(tag |>> (fun tag -> { CssSelectorElement.Default with Tag = tag })))
(opt (str ":" >>. parseCssPseudoClass) |> attempt)
(fun selectorElem pseudoClass -> { selectorElem with PseudoClass = pseudoClassFromOptional pseudoClass })
But I'm not really liking how it's shaping up. 但我并不是很喜欢它的形成方式。 I was expecting to come up with something easier to understand, but the part parsing (tag?id|tag?classes|tag)pseudoClass? 我期待得到一些更容易理解的东西,但是部分解析(tag?id | tag?classes | tag)pseudoClass? with a few pipe2's and attempt's is really bad. 有几个pipe2和尝试是非常糟糕的。
Came someone with more experience in FParsec educate me on better ways to accomplish this? 来自有更多FParsec经验的人教会我更好的方法来实现这一目标吗? I'm thinking on trying FSLex/Yacc or Boost.Spirit instead of FParsec is see if I can come up with nicer code with them 我正在考虑尝试FSLex / Yacc或Boost.Spirit而不是FParsec,看看我是否可以用它们提出更好的代码
You could extract some parts of that complex parser to variables, eg: 您可以将该复杂解析器的某些部分提取到变量中,例如:
let tagid =
pipe2 (opt tag)
id
(fun tag id ->
{ CssSelectorElement.Default with
Tag = stringFromOptional tag
Id = id })
You could also try using an applicative interface , personally I find it easier to use and think than pipe2: 您也可以尝试使用应用程序界面 ,我个人觉得它比pipe2更容易使用和思考:
let tagid =
(fun tag id ->
{ CssSelectorElement.Default with
Tag = stringFromOptional tag
Id = id })
<!> opt tag
<*> id
As Mauricio said, if you find yourself repeating code in an FParsec parser, you can always factor out the common parts into a variable or custom combinator. 正如Mauricio所说,如果你发现自己在FParsec解析器中重复代码,你总是可以将公共部分分解为变量或自定义组合器。 This is one of the great advantages of combinator libraries. 这是组合子库的一大优势。
However, in this case you could also simplify and optimize the parser by reorganizing the grammer a bit. 但是,在这种情况下,您还可以通过重新组织语法来简化和优化解析器。 You could, for example, replace the lower half of the parseCssSelectorElement
parser with 例如,您可以使用替换parseCssSelectorElement
解析器的下半部分
let defSel = CssSelectorElement.Default
let pIdSelector = id |>> (fun str -> {defSel with Id = str})
let pClassesSelector = classes |>> (fun strs -> {defSel with Classes = strs})
let pSelectorMain =
choice [pIdSelector
pClassesSelector
pipe2 tag (pIdSelector <|> pClassesSelector <|>% defSel)
(fun tagStr sel -> {sel with Tag = tagStr})]
pipe2 pSelectorMain (opt (str ":" >>. parseCssPseudoClass))
(fun sel optPseudo ->
match optPseudo with
| None -> sel
| Some pseudo -> {sel with PseudoClass = pseudo})
By the way, if you want to parse a large number of string constants, it's more efficient to use a dictionary based parsers, like 顺便说一下,如果你想解析大量的字符串常量,那么使用基于字典的解析器就更有效了,比如说
let pCssPseudoClass : Parser<CssPseudoClass,unit> =
let pseudoDict = dict ["link", CssPseudoClass.Link
"visited", CssPseudoClass.Visited
"hover", CssPseudoClass.Hover
"active", CssPseudoClass.Active
"before", CssPseudoClass.Before
"after", CssPseudoClass.After
"first-line", CssPseudoClass.FirstLine
"first-letter", CssPseudoClass.FirstLetter]
fun stream ->
let reply = identifier stream
if reply.Status <> Ok then Reply(reply.Status, reply.Error)
else
let mutable pseudo = CssPseudoClass.None
if pseudoDict.TryGetValue(reply.Result, &pseudo) then Reply(pseudo)
else // skip to beginning of invalid pseudo class
stream.Skip(-reply.Result.Length)
Reply(Error, messageError "unknown pseudo class")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.