簡體   English   中英

Haskell讀取變量名

[英]Haskell read variable name

我需要編寫解析某種語言的代碼。 我被困在解析變量名上-它可以是至少1個字符長的任何內容,以小寫字母開頭,並且可以包含下划線'_'字符。 我認為我從以下代碼入手:

identToken :: Parser String
identToken = do 
                       c <- letter
                       cs <- letdigs
                       return (c:cs)
             where letter = satisfy isLetter
                   letdigs = munch isLetter +++ munch isDigit +++ munch underscore
                   num = satisfy isDigit
                   underscore = \x -> x == '_'
                   lowerCase = \x -> x `elem` ['a'..'z'] -- how to add this function to current code?

ident :: Parser Ident
ident = do 
          _ <- skipSpaces
          s <- identToken
          skipSpaces; return $ s

idents :: Parser Command
idents = do 
          skipSpaces; ids <- many1 ident
          ...

但是,此功能給我一個奇怪的結果。 如果我調用測試函數

test_parseIdents :: String -> Either Error [Ident]
test_parseIdents p = 
  case readP_to_S prog p of
    [(j, "")] -> Right j
    [] -> Left InvalidParse
    multipleRes -> Left (AmbiguousIdents multipleRes)
  where
    prog :: Parser [Ident]
    prog = do
      result <- many ident
      eof
      return result

像這樣:

test_parseIdents  "test"

我得到這個:

Left (AmbiguousIdents [(["test"],""),(["t","est"],""),(["t","e","st"],""),
    (["t","e","st"],""),(["t","est"],""),(["t","e","st"],""),(["t","e","st"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],"")])

請注意, Parser只是ReadP a同義詞。

我還想在解析器中編碼變量名應以小寫字母開頭。

謝謝您的幫助。

問題的一部分在於您使用+++運算符。 以下代碼對我有用:

import Data.Char
import Text.ParserCombinators.ReadP

type Parser a = ReadP a
type Ident = String

identToken :: Parser String
identToken = do c <- satisfy lowerCase
                cs <- letdigs
                return (c:cs)
  where lowerCase = \x -> x `elem` ['a'..'z']
        underscore = \x -> x == '_'
        letdigs = munch (\c -> isLetter c || isDigit c || underscore c)

ident :: Parser Ident
ident = do _ <- skipSpaces
           s <- identToken
           skipSpaces
           return s

test_parseIdents :: String -> Either String [Ident]
test_parseIdents p = case readP_to_S prog p of
    [(j, "")]   -> Right j
    []          -> Left "Invalid parse"
    multipleRes -> Left ("Ambiguous idents: " ++ show multipleRes)
  where prog :: Parser [Ident]
        prog = do result <- many ident
                  eof
                  return result

main = print $ test_parseIdents "test_1349_zefz"

所以出了什么問題:

  • +++在其參數上強加一個順序,並允許多種選擇成功( 對稱選擇 )。 <++是左偏的,因此只有最左邊的選項才能成功->這樣可以消除解析過程中的歧義,但仍然存在下一個問題。

  • 解析器正在尋找字母 然后再數字, 最后強調。 例如,下划線后的數字將失敗。 解析器必須進行修改,以munch要么字母,數字或下划線字符。

我還刪除了一些未使用的函數,並對數據類型的定義進行了有根據的猜測。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM