簡體   English   中英

在Haskell中解析化合物

[英]Parsing chemical compounds in Haskell

我試圖為自己做一個化合物解析器作為練習,但我卡住了。

這是我嘗試使用的數據類型:

data Compound = Monoatomic String Int | Poliatomic [Compound] Int

鑒於像“Ca(OH)2”這樣的字符串,我希望得到類似的東西;

Poliatomic [Monoatomic "Ca" 1, Poliatomic [Monoatomic "O" 1, Monoatomic "H" 1] 2 ] 1

單原子的單原子類型構造函數,以及多個原子的Poliatomic構造函數。 在該實施例中,(OH)2表示內部Poliatomic結構,並且表示為Poliatomic [(Monoatomic O 1), (Monoatomic H 1 )] 2 數字2表示我們有兩個poliosomic結構。

我做了這么多;

import Data.Char (isUpper)
data Compound = Monoatomic String Int | Poliatomic [Compound] Int

instance Functor Compound where
        fmap f (Monoatomic s i) = Monoatomic (f s) i
        fmap f (Poliatomic xs i) = Poliatomic (fmap f xs) i

-- Change number of a compound
changeNumber :: Compound -> Int -> Compound
changeNumber (Monoatomic xs _) n = Monoatomic xs n
changeNumber (Poliatomic xs _) n = Poliatomic xs n

-- Take a partial compound and next chracter return partial compound
parseCompound :: Compound -> Char -> Compound
parseCompound (Poliatomic x:xs n) c
        | isUpper c = Poliatomic ((Monoatomic [c] 1):x:xs) n -- add new atom to compound
        | isLower c = Poliatomic 

-- I want to do foldl parseCompound (Poliatomic [] 1) inputstring

但是我繼續這太復雜了。

它看起來應該是一個相當簡單的問題,但我對Haskell很新,並且無法弄清楚如何完成這個功能。

我有這個問題:

  • 到目前為止我的方法是否正確?
  • 我怎樣才能做到這一點?

我已經使用Parsec創建了您正在尋找的解析器,以便您了解Parsec解析器的外觀,因為您聲明您對它沒什么經驗。

即使Haskell經驗很少,它也應該具有相當的可讀性。 我已經提供了一些關於特別需要注意的部分的評論。

import Text.Read (readMaybe)
import Data.Maybe (fromMaybe)
import Text.Parsec (parse, many, many1, digit, char, string, (<|>), choice, try)
import Text.Parsec.String (Parser)


data Compound
  = Monoatomic String Int
  | Poliatomic [Compound] Int
  deriving Show


-- Run the substance parser on "Ca(OH)2" and print the result which is
-- Right (Poliatomic [Monoatomic "Ca" 1,Poliatomic [Monoatomic "O" 1,Monoatomic "H" 1] 2] 1)
main = print (parse substance "" "Ca(OH)2")


-- parse the many parts which make out the top-level polyatomic compound
--
-- "many1" means "at least one"
substance :: Parser Compound
substance = do
  topLevel <- many1 part
  return (Poliatomic topLevel 1)


-- a single part in a substance is either a poliatomic compound or a monoatomic compound
part :: Parser Compound
part = poliatomic <|> monoatomic


-- a poliatomic compound starts with a '(', then has many parts inside, then
-- ends with ')' and has a number after it which indicates how many of it there
-- are.
poliatomic :: Parser Compound
poliatomic = do
  char '('
  inner <- many1 part
  char ')'
  amount <- many1 digit
  return (Poliatomic inner (read amount))


-- a monoatomic compound is one of the many element names, followed by an
-- optional digit. if omitted, the amount defaults to 1.
--
-- "try" is a little special, and required in this case. it means "if a parser
-- fails, try the next one from where you started, not from where the last one
-- failed."
--
-- "choice" means "try all parsers in this list, stop when one matches"
--
-- "many" means "zero or more"
monoatomic :: Parser Compound
monoatomic = do
  name <- choice [try nameParser | nameParser <- atomstrings]
  amount <- many digit
  return (Monoatomic name (fromMaybe 1 (readMaybe amount)))


-- a list of parser for atom names. it is IMPORTANT that the longest names
-- come first. the reason for that is that it makes the parser much simpler to
-- write, and it can execute much faster. it's common when designing parsers to
-- consider things like that when creating them.
atomstrings :: [Parser String]
atomstrings = map string (words "He Li Be Ne Na Mg Al Ca H B C N O F")

我試圖以一種初學者應該至少可以合理訪問的方式編寫這段代碼,但它可能不是很清楚,所以我很樂意回答任何有關此問題的問題。


上面的解析器是你想要的。 但是,如果我有自由韁繩的話,那不是我要寫的那個。 如果我想做但我想要,我會利用這個事實

Ca(OH)2

可以表示為

(Ca)1((O)1(H)1)2

這是一個更加統一的表示,反過來導致更簡單的數據結構和具有更少樣板的解析器。 我想寫的代碼看起來像

import Text.Read (readMaybe)
import Data.Maybe (fromMaybe)
import Control.Applicative ((<$>), (<*>), pure)
import Text.Parsec (parse, many, many1, digit, char, string, (<|>), choice, try, between)
import Text.Parsec.String (Parser)


data Substance
  = Part [Substance] Int
  | Atom String
  deriving Show


main = print (parse substance "" "Ca(OH)2")
-- Right (Part [Part [Atom "Ca"] 1,Part [Part [Atom "O"] 1,Part [Atom "H"] 1] 2] 1)

substance :: Parser Substance
substance = Part <$> many1 part <*> pure 1

part :: Parser Substance
part = do
  inner <- polyatomic <|> monoatomic
  amount <- fromMaybe 1 . readMaybe <$> many digit
  return (Part inner amount)

polyatomic :: Parser [Substance]
polyatomic = between (char '(') (char ')') (many1 part)

monoatomic :: Parser [Substance]
monoatomic = (:[]) . Atom <$> choice (map (try . string) atomstrings)

atomstrings :: [String]
atomstrings = words "He Li Be Ne Na Mg Al Ca H B C N O F"

這在Haskell中使用了一些“高級”技巧(例如<$><*>運算符),所以對你來說可能不感興趣,OP,但是我把它放在其他可能更高級Haskell的人身上用戶和了解Parsec。

正如你所看到的,這個解析器只占用了大約半頁,這就像Parsec這樣的庫的強大功能 - 它們使編寫解析器變得簡單而有趣

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM