简体   繁体   English

Haskell / Parsec:如何将Text.Parsec.Token与Text.Parsec.Indent一起使用(来自缩进包)

[英]Haskell/Parsec: how do I use Text.Parsec.Token with Text.Parsec.Indent (from the indents package)

The indents package for Haskell's Parsec provides a way to parse indentation-style languages (like Haskell and Python). Haskell的Parsec的indents包提供了一种解析缩进式语言(如Haskell和Python)的方法。 It redefines the Parser type, so how do you use the token parser functions exported by Parsec's Text.Parsec.Token module, which are of the normal Parser type? 它重新定义了Parser类型,那么如何使用Parsec的Text.Parsec.Token模块导出的令牌解析器函数,这些函数是普通的Parser类型?

Background 背景

Parsec comes with a load of modules . Parsec附带了大量模块 most of them export a bunch of useful parsers (eg newline from Text.Parsec.Char , which parses a newline) or parser combinators (eg count np from Text.Parsec.Combinator , which runs the parser p , n times) 大多导出一堆有用解析器(例如newlineText.Parsec.Char ,它解析新行)或解析器组合(例如count npText.Parsec.Combinator ,它运行解析器P,N次)

However, the module Text.Parsec.Token would like to export functions which are parametrized by the user with features of the language being parsed, so that, for example, the braces p function will run the parser p after parsing a '{' and before parsing a '}', ignoring things like comments, the syntax of which depends on your language. 但是,模块Text.Parsec.Token想要导出由用户参数化的函数和正在解析的语言的特征,因此,例如, braces p函数将在解析'{'后运行解析器p在解析'}'之前,忽略注释之类的东西,其语法取决于您的语言。

The way that Text.Parsec.Token achieves this is that it exports a single function makeTokenParser , which you call, giving it the parameters of your specific language (like what a comment looks like) and it returns a record containing all of the functions in Text.Parsec.Token , adapted to your language as specified. Text.Parsec.Token实现这一点的方式是它导出一个函数makeTokenParser ,你调用它,给它你特定语言的参数(就像注释的样子),它返回一个包含所有函数的记录。 Text.Parsec.Token ,适合您指定的语言。

Of course, in an indentation-style language, these would need to be adapted further (perhaps? here's where I'm not sure – I'll explain in a moment) so I note that the (presumably obsolete) IndentParser package provides a module Text.ParserCombinators.Parsec.IndentParser.Token which looks to be a drop-in replacement for Text.Parsec.Token . 当然,在缩进式语言中,这些需要进一步调整(也许?这里是我不确定的地方 - 我稍后会解释)所以我注意到(可能是过时的)IndentParser包提供了一个模块Text.ParserCombinators.Parsec.IndentParser.Token ,它看起来是Text.Parsec.Token替代Text.Parsec.Token

I should mention at some point that all the Parsec parsers are monadic functions, so they do magic things with state so that error messages can say at what line and column in the source file the error appeared 我应该在某些时候提到所有的Parsec解析器都是monadic函数,所以它们用状态做神奇的事情,这样错误消息可以说出源文件中的哪一行和哪一行出现了错误

My Problem 我的问题

For a couple of small reasons it appears to me that the indents package is more-or-less the current version of IndentParser, however it does not provide a module that looks like Text.ParserCombinators.Parsec.IndentParser.Token , it only provides Text.Parsec.Indent , so I am wondering how one goes about getting all the token parsers from Text.Parsec.Token (like reserved "something" which parses the reserved keyword "something", or like braces which I mentioned earlier). 由于一些小的原因,在我看来,缩进包或多或少是当前版本的IndentParser,但是它没有提供看起来像Text.ParserCombinators.Parsec.IndentParser.Token的模块,它只提供Text.Parsec.Indent ,所以我想知道如何从Text.Parsec.Token获取所有令牌解析器 (如reserved "something"解析保留关键字“某事”,或者像我之前提到的那些braces )。

It would appear to me that (the new) Text.Parsec.Indent works by some sort of monadic state magic to work out at what column bits of source code are, so that it doesn't need to modify the token parsers like whiteSpace from Text.Parsec.Token , which is probably why it doesn't provide a replacement module. 在我看来,(新的) Text.Parsec.Indent通过某种Text.Parsec.Indent状态魔法来处理源代码的哪些列位,因此它不需要修改像whiteSpace那样的令牌解析器Text.Parsec.Token ,这可能是它没有提供替换模块的原因。 But I am having a problem with types. 但是我遇到类型问题。

You see, without Text.Parsec.Indent , all my parsers are of type Parser Something where Something is the return type and Parser is a type alias defined in Text.Parsec.String as 你看,没有Text.Parsec.Indent ,我所有的解析器都是Parser Something ,其中Something是返回类型, Parser是Text.Parsec.String中定义的类型别名

type Parser = Parsec String ()

but with Text.Parsec.Indent , instead of importing Text.Parsec.String , I use my own definition 但是使用Text.Parsec.Indent ,我使用自己的定义,而不是导入Text.Parsec.String

type Parser a = IndentParser String () a

which makes all my parsers of type IndentParser String () Something , where IndentParser is defined in Text.Parsec.Indent. 这使我的所有解析器类型为IndentParser String () Something ,其中IndentParser在Text.Parsec.Indent中定义。 but the token parsers that I'm getting from makeTokenParser in Text.Parsec.Token are of the wrong type. 但是我从makeTokenParser中的Text.Parsec.Token获取的令牌解析器的类型错误。

If this isn't making much sense by now, it's because I'm a bit lost. 如果现在这没有多大意义,那是因为我有点失落。 The type issue is discussed a bit here . 这里讨论了类型问题。


The error I'm getting is that I've tried replacing the one definition of Parser above with the other, but then when I try to use one of the token parsers from Text.Parsec.Token , I get the compile error 我得到的错误是我尝试用另一个替换上面的Parser的一个定义,但是当我尝试使用Text.Parsec.Token一个令牌解析器时,我得到了编译错误

Couldn't match expected type `Control.Monad.Trans.State.Lazy.State
                                Text.Parsec.Pos.SourcePos'
            with actual type `Data.Functor.Identity.Identity'
Expected type: P.GenTokenParser
                 String
                 ()
                 (Control.Monad.Trans.State.Lazy.State Text.Parsec.Pos.SourcePos)
  Actual type: P.TokenParser ()

Links 链接

Sadly, neither of the examples above use token parsers like those in Text.Parsec.Token. 遗憾的是,上面的示例都没有像Text.Parsec.Token中那样使用令牌解析器。

What are you trying to do? 你想做什么?

It sounds like you want to have your parsers defined everywhere as being of type 听起来你想让你的解析器在任何地方被定义为类型

Parser Something

(where Something is the return type) and to make this work by hiding and redefining the Parser type which is normally imported from Text.Parsec.String or similar. (其中Something是返回类型)并通过隐藏和重新定义通常从Text.Parsec.String或类似方法导入的Parser类型来实现此功能。 You still need to import some of Text.Parsec.String , to make Stream an instance of a monad; 您仍然需要导入一些Text.Parsec.String ,以使Stream成为monad的实例; do this with the line: 用这条线做到这一点:

import Text.Parsec.String ()

Your definition of Parser is correct. 您对Parser定义是正确的。 Alternatively and equivalently (for those following the chat in the comments) you can use 或者等效地(对于那些在评论中聊天的人)你可以使用

import Control.Monad.State
import Text.Parsec.Pos (SourcePos)

type Parser = ParsecT String () (State SourcePos)

and possibly do away with the import Text.Parsec.Indent (IndentParser) in the file in which this definition appears. 并且可能在显示此定义的文件中import Text.Parsec.Indent (IndentParser)

Error, error on the wall 错误,墙上的错误

Your problem is that you're looking at the wrong part of the compiler error message. 您的问题是您正在查看编译器错误消息的错误部分。 You're focusing on 你专注于

Couldn't match expected type `State SourcePos' with actual type `Identity'

when you should be focusing on 当你应该专注于

Expected type: P.GenTokenParser ...
  Actual type: P.TokenParser ...

It compiles! 它汇编!

Where you "import" parsers from Text.Parsec.Token , what you actually do, of course (as you briefly mentioned) is first to define a record your language parameters and then to pass this to the function makeTokenParser , which returns a record containing the token parsers. 你从Text.Parsec.Token “导入”解析器的Text.Parsec.Token ,你实际做的当然(正如你简要提到的)首先定义一个记录你的语言参数,然后将它传递给函数makeTokenParser ,它返回一个包含的记录令牌解析器。

You must therefore have some lines that look something like this: 因此,您必须有一些看起来像这样的行:

import qualified Text.Parsec.Token as P

beetleDef :: P.LanguageDef st
beetleDef =
    haskellStyle {
        parameters, parameters etc.
        }

lexer :: P.TokenParser ()
lexer = P.makeTokenParser beetleDef

... but a P.LanguageDef st is just a GenLanguageDef String st Identity , and a P.TokenParser () is really a GenTokenParser String () Identity . ...但是P.LanguageDef st只是GenLanguageDef String st Identity ,而P.TokenParser ()实际上是GenTokenParser String () Identity

You must change your type declarations to the following: 您必须将类型声明更改为以下内容:

import Control.Monad.State
import Text.Parsec.Pos (SourcePos)
import qualified Text.Parsec.Token as P

beetleDef :: P.GenLanguageDef String st (State SourcePos)
beetleDef =
    haskellStyle {
        parameters, parameters etc.
        }

lexer :: P.GenTokenParser String () (State SourcePos)
lexer = P.makeTokenParser beetleDef

... and that's it! ......就是这样! This will allow your "imported" token parsers to have type ParsecT String () (State SourcePos) Something , instead of Parsec String () Something (which is an alias for ParsecT String () Identity Something ) and your code should now compile. 这将允许您的“导入”令牌解析器具有类型ParsecT String () (State SourcePos) Something ,而不是Parsec String () Something (它是ParsecT String () Identity Something的别名),您的代码现在应该编译。

(For maximum generality, I'm assuming that you might be defining the Parser type in a file separate from, and imported by, the file in which you define your actual parser functions. Hence the two repeated import statements.) (为了最大限度的通用性,我假设您可能在一个文件中定义Parser类型,该文件与您定义实际解析器函数的文件分开并导入。因此,这两个重复的import语句。)

Thanks 谢谢

Many thanks to Daniel Fischer for helping me with this. 非常感谢Daniel Fischer帮我解决这个问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM