简体   繁体   English

Haskell读取变量名

[英]Haskell read variable name

I need to write a code that parses some language. 我需要编写解析某种语言的代码。 I got stuck on parsing variable name - it can be anything that is at least 1 char long, starts with lowercase letter and can contain underscore '_' character. 我被困在解析变量名上-它可以是至少1个字符长的任何内容,以小写字母开头,并且可以包含下划线'_'字符。 I think I made a good start with following code: 我认为我从以下代码入手:

identToken :: Parser String
identToken = do 
                       c <- letter
                       cs <- letdigs
                       return (c:cs)
             where letter = satisfy isLetter
                   letdigs = munch isLetter +++ munch isDigit +++ munch underscore
                   num = satisfy isDigit
                   underscore = \x -> x == '_'
                   lowerCase = \x -> x `elem` ['a'..'z'] -- how to add this function to current code?

ident :: Parser Ident
ident = do 
          _ <- skipSpaces
          s <- identToken
          skipSpaces; return $ s

idents :: Parser Command
idents = do 
          skipSpaces; ids <- many1 ident
          ...

This function however gives me a weird results. 但是,此功能给我一个奇怪的结果。 If I call my test function 如果我调用测试函数

test_parseIdents :: String -> Either Error [Ident]
test_parseIdents p = 
  case readP_to_S prog p of
    [(j, "")] -> Right j
    [] -> Left InvalidParse
    multipleRes -> Left (AmbiguousIdents multipleRes)
  where
    prog :: Parser [Ident]
    prog = do
      result <- many ident
      eof
      return result

like this: 像这样:

test_parseIdents  "test"

I get this: 我得到这个:

Left (AmbiguousIdents [(["test"],""),(["t","est"],""),(["t","e","st"],""),
    (["t","e","st"],""),(["t","est"],""),(["t","e","st"],""),(["t","e","st"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],"")])

Note that Parser is just synonym for ReadP a . 请注意, Parser只是ReadP a同义词。

I also want to encode in the parser that variable names should start with a lowercase character. 我还想在解析器中编码变量名应以小写字母开头。

Thank you for your help. 谢谢您的帮助。

Part of the problem is with your use of the +++ operator. 问题的一部分在于您使用+++运算符。 The following code works for me: 以下代码对我有用:

import Data.Char
import Text.ParserCombinators.ReadP

type Parser a = ReadP a
type Ident = String

identToken :: Parser String
identToken = do c <- satisfy lowerCase
                cs <- letdigs
                return (c:cs)
  where lowerCase = \x -> x `elem` ['a'..'z']
        underscore = \x -> x == '_'
        letdigs = munch (\c -> isLetter c || isDigit c || underscore c)

ident :: Parser Ident
ident = do _ <- skipSpaces
           s <- identToken
           skipSpaces
           return s

test_parseIdents :: String -> Either String [Ident]
test_parseIdents p = case readP_to_S prog p of
    [(j, "")]   -> Right j
    []          -> Left "Invalid parse"
    multipleRes -> Left ("Ambiguous idents: " ++ show multipleRes)
  where prog :: Parser [Ident]
        prog = do result <- many ident
                  eof
                  return result

main = print $ test_parseIdents "test_1349_zefz"

So what went wrong: 所以出了什么问题:

  • +++ imposes an order on its arguments, and allows for multiple alternatives to succeed ( symmetric choice ). +++在其参数上强加一个顺序,并允许多种选择成功( 对称选择 )。 <++ is left-biased so only the left-most option succeeds -> this would remove the ambiguity in the parse, but still leaves the next problem. <++是左偏的,因此只有最左边的选项才能成功->这样可以消除解析过程中的歧义,但仍然存在下一个问题。

  • Your parser was looking for letters first , then digits, and finally underscores. 解析器正在寻找字母 然后再数字, 最后强调。 Digits after underscores failed, for example. 例如,下划线后的数字将失败。 The parser had to be modified to munch characters that were either letters, digits or underscores. 解析器必须进行修改,以munch要么字母,数字或下划线字符。

I also removed some functions that were unused and made an educated guess for the definition of your datatypes. 我还删除了一些未使用的函数,并对数据类型的定义进行了有根据的猜测。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM