[英]Haskell - How to parse XML response into Haskell datatypes?
我是一个初学者,尝试通过做一些简单的解析问题来学习Haskell。 我有这个XML文件。 这是Goodreads的API响应。
<GoodreadsResponse>
<Request>
<authentication>true</authentication>
<key>API_KEY</key>
<method>search_search</method>
</Request>
<search>
<query>fantasy</query>
<results-start>1</results-start>
<results-end>20</results-end>
<total-results>53297</total-results>
<source>Goodreads</source>
<query-time-seconds>0.15</query-time-seconds>
<results>
<work>
<id type="integer">4640799</id>
<books_count type="integer">640</books_count>
<ratings_count type="integer">5640935</ratings_count>
<text_reviews_count type="integer">90100</text_reviews_count>
<original_publication_year type="integer">1997</original_publication_year>
<original_publication_month type="integer">6</original_publication_month>
<original_publication_day type="integer">26</original_publication_day>
<average_rating>4.46</average_rating>
<best_book type="Book">
<id type="integer">3</id>
<title>Harry Potter and the Sorcerer's Stone (Harry Potter, #1)</title>
<author>
<id type="integer">1077326</id>
<name>J.K. Rowling</name>
</author>
<image_url>https://images.gr-assets.com/books/1474154022m/3.jpg</image_url>
<small_image_url>https://images.gr-assets.com/books/1474154022s/3.jpg</small_image_url>
</best_book>
</work>
...
...
...
...
这就是我到目前为止
{-# LANGUAGE DeriveGeneric #-}
module Lib where
import Data.ByteString.Lazy (ByteString)
import Data.Text (Text)
import GHC.Generics (Generic)
import Network.HTTP.Conduit (simpleHttp)
import Text.Pretty.Simple (pPrint)
import Text.XML.Light
data GRequest = GRequest { authentication :: Text
, key :: Text
, method :: Text
}
deriving (Generic, Show)
data GSearch = GSearch { query :: Text
, results_start :: Int
, results_end :: Int
, total_results :: Int
, source :: Text
, query_time_seconds :: Float
, search_results :: GResults
}
deriving (Generic, Show)
data GResults = GResults { results :: [Work] }
deriving (Generic, Show)
data Work = Work { id :: Int
, booksCount :: Int
, ratingsCount :: Int
, text_reviewsCount :: Int
, originalPublicationYear :: Int
, originalPublicationMonth :: Int
, originalPublicationDay :: Int
, averageRating :: Float
, bestBook :: Book
}
deriving (Generic, Show)
data Book = Book { bID :: Int
, bTitle :: Text
, bAuthor :: Author
, bImageURL :: Maybe Text
, bSmallImageURL :: Maybe Text
}
deriving (Generic, Show)
data Author = Author { authorID :: Int
, authorName :: Text
}
deriving (Generic, Show)
data GoodreadsResponse = GoodreadsResponse { request :: GRequest
, search :: GSearch
}
deriving (Generic, Show)
main :: IO ()
main = do
x <- simpleHttp apiString :: IO ByteString -- apiString is the API URL
let listOfElements = onlyElems $ parseXML x
filteredElements = concatMap (findElements (simpleName "work")) listOfElements
simpleName s = QName s Nothing Nothing
pPrint $ filteredElements
最终,我想做的是将<work></work>
各个方面(来自<results> .. </results>
)放入haskell可行的类型中。
但是我不确定该怎么做。 我正在使用xml包将其解析为默认类型。 但是不知道如何将其放入我的自定义类型。
您似乎可以在此处找到要进行模式匹配的最相关类型。 也就是说你要取[Content]
的结果是在parseXML
从功能Text.XML.Light.Input
回报和模式匹配每个单独的Content
例如,大多忽略CRef
数据构造和闷头Elem
是因为这些都是您关心的XML标记(除了Text
构造函数之外, Text
构造函数包含在XML标记内找到的非XML字符串)。
例如,您需要执行以下操作:
#!/usr/bin/env stack
-- stack --resolver lts-12.24 --install-ghc runghc --package xml
import Text.XML.Light
import Data.Maybe
data MyXML =
MyXML String [MyXML] -- Nested XML elements
| Leaf String -- Leaves in the XML doc
| Unit
deriving (Show)
c2type :: Content -> Maybe MyXML
c2type (Text s) = Just $ Leaf $ cdData s
c2type (CRef _) = Nothing
c2type (Elem e) = Just $ MyXML (qName $ elName e) (mapMaybe c2type (elContent e))
main :: IO ()
main = do
dat <- readFile "input.xml"
let xml = parseXML dat
-- print xml
print $ mapMaybe c2type xml
对于上面的代码段,说input.xml
包含以下XML:
<work>
<a>1</a>
<b>2</b>
</work>
然后运行示例将产生:
$ ./xml.hs
[MyXML "work" [Leaf "\n ",MyXML "a" [Leaf "1"],Leaf "\n ",MyXML "b" [Leaf "2"],Leaf "\n"],Leaf "\n"]
对于更广泛的用例,您可能会发现最有趣的功能可能包括:
(qName . elName) -- Get the name of a tag in String format from an Elem
elContent -- Recursively extract the XML tag contents of an Elem
elAttribs -- Can check those 'type' attributes on some of your tags
为了看一下XML解析器为您的代码返回的数据类型的一般结构,我强烈建议例如取消注释上面代码示例中的print xml
行,并检查它在命令行中显示的内容列表。 。 仅此一项就可以准确告诉您您关心的领域。 例如,这是我最小限度的XML输入示例的内容:
[Elem (Element {elName = QName {qName = "work", qURI = Nothing, qPrefix = Nothing}, elAttribs = [], elContent = [Text (CData {cdVerbatim = CDataText, cdData = "\n ", cdLine = Just 1}),Elem (Element {elName = QName {qName = "a", qURI = Nothing, qPrefix = Nothing}, elAttribs = [], elContent = [Text (CData {cdVerbatim = CDataText, cdData = "1", cdLine = Just 2})], elLine = Just 2}),Text (CData {cdVerbatim = CDataText, cdData = "\n ", cdLine = Just 2}),Elem (Element {elName = QName {qName = "b", qURI = Nothing, qPrefix = Nothing}, elAttribs = [], elContent = [Text (CData {cdVerbatim = CDataText, cdData = "2", cdLine = Just 3})], elLine = Just 3}),Text (CData {cdVerbatim = CDataText, cdData = "\n", cdLine = Just 3})], elLine = Just 1}),Text (CData {cdVerbatim = CDataText, cdData = "\n", cdLine = Just 4})]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.