[英]Haskell read variable name
我需要編寫解析某種語言的代碼。 我被困在解析變量名上-它可以是至少1個字符長的任何內容,以小寫字母開頭,並且可以包含下划線'_'字符。 我認為我從以下代碼入手:
identToken :: Parser String
identToken = do
c <- letter
cs <- letdigs
return (c:cs)
where letter = satisfy isLetter
letdigs = munch isLetter +++ munch isDigit +++ munch underscore
num = satisfy isDigit
underscore = \x -> x == '_'
lowerCase = \x -> x `elem` ['a'..'z'] -- how to add this function to current code?
ident :: Parser Ident
ident = do
_ <- skipSpaces
s <- identToken
skipSpaces; return $ s
idents :: Parser Command
idents = do
skipSpaces; ids <- many1 ident
...
但是,此功能給我一個奇怪的結果。 如果我調用測試函數
test_parseIdents :: String -> Either Error [Ident]
test_parseIdents p =
case readP_to_S prog p of
[(j, "")] -> Right j
[] -> Left InvalidParse
multipleRes -> Left (AmbiguousIdents multipleRes)
where
prog :: Parser [Ident]
prog = do
result <- many ident
eof
return result
像這樣:
test_parseIdents "test"
我得到這個:
Left (AmbiguousIdents [(["test"],""),(["t","est"],""),(["t","e","st"],""),
(["t","e","st"],""),(["t","est"],""),(["t","e","st"],""),(["t","e","st"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],"")])
請注意, Parser
只是ReadP a
同義詞。
我還想在解析器中編碼變量名應以小寫字母開頭。
謝謝您的幫助。
問題的一部分在於您使用+++
運算符。 以下代碼對我有用:
import Data.Char
import Text.ParserCombinators.ReadP
type Parser a = ReadP a
type Ident = String
identToken :: Parser String
identToken = do c <- satisfy lowerCase
cs <- letdigs
return (c:cs)
where lowerCase = \x -> x `elem` ['a'..'z']
underscore = \x -> x == '_'
letdigs = munch (\c -> isLetter c || isDigit c || underscore c)
ident :: Parser Ident
ident = do _ <- skipSpaces
s <- identToken
skipSpaces
return s
test_parseIdents :: String -> Either String [Ident]
test_parseIdents p = case readP_to_S prog p of
[(j, "")] -> Right j
[] -> Left "Invalid parse"
multipleRes -> Left ("Ambiguous idents: " ++ show multipleRes)
where prog :: Parser [Ident]
prog = do result <- many ident
eof
return result
main = print $ test_parseIdents "test_1349_zefz"
所以出了什么問題:
+++
在其參數上強加一個順序,並允許多種選擇成功( 對稱選擇 )。 <++
是左偏的,因此只有最左邊的選項才能成功->這樣可以消除解析過程中的歧義,但仍然存在下一個問題。
解析器正在尋找字母, 然后再數字, 最后強調。 例如,下划線后的數字將失敗。 解析器必須進行修改,以munch
那要么字母,數字或下划線字符。
我還刪除了一些未使用的函數,並對數據類型的定義進行了有根據的猜測。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.