繁体   English   中英

Haskell-如何将XML响应解析为Haskell数据类型?

[英]Haskell - How to parse XML response into Haskell datatypes?

我是一个初学者,尝试通过做一些简单的解析问题来学习Haskell。 我有这个XML文件。 这是Goodreads的API响应。

<GoodreadsResponse>
    <Request>
        <authentication>true</authentication>
        <key>API_KEY</key>
        <method>search_search</method>
    </Request>
    <search>
        <query>fantasy</query>
        <results-start>1</results-start>
        <results-end>20</results-end>
        <total-results>53297</total-results>
        <source>Goodreads</source>
        <query-time-seconds>0.15</query-time-seconds>
        <results>
            <work>
                <id type="integer">4640799</id>
                <books_count type="integer">640</books_count>
                <ratings_count type="integer">5640935</ratings_count>
                <text_reviews_count type="integer">90100</text_reviews_count>
                <original_publication_year type="integer">1997</original_publication_year>
                <original_publication_month type="integer">6</original_publication_month>
                <original_publication_day type="integer">26</original_publication_day>
                <average_rating>4.46</average_rating>
                <best_book type="Book">
                    <id type="integer">3</id>
                    <title>Harry Potter and the Sorcerer's Stone (Harry Potter, #1)</title>
                    <author>
                        <id type="integer">1077326</id>
                        <name>J.K. Rowling</name>
                    </author>
                    <image_url>https://images.gr-assets.com/books/1474154022m/3.jpg</image_url>
                    <small_image_url>https://images.gr-assets.com/books/1474154022s/3.jpg</small_image_url>
                </best_book>
            </work>
              ...
              ...
              ...
              ...

这就是我到目前为止

{-# LANGUAGE DeriveGeneric #-}

module Lib where

import           Data.ByteString.Lazy (ByteString)
import           Data.Text            (Text)
import           GHC.Generics         (Generic)
import           Network.HTTP.Conduit (simpleHttp)
import           Text.Pretty.Simple   (pPrint)
import           Text.XML.Light

data GRequest = GRequest { authentication :: Text
                         , key            :: Text
                         , method         :: Text
                         }
              deriving (Generic, Show)

data GSearch = GSearch { query              :: Text
                       , results_start      :: Int
                       , results_end        :: Int
                       , total_results      :: Int
                       , source             :: Text
                       , query_time_seconds :: Float
                       , search_results     :: GResults
                       }
             deriving (Generic, Show)

data GResults = GResults { results :: [Work] }
              deriving (Generic, Show)


data Work = Work { id                       :: Int
                 , booksCount               :: Int
                 , ratingsCount             :: Int
                 , text_reviewsCount        :: Int
                 , originalPublicationYear  :: Int
                 , originalPublicationMonth :: Int
                 , originalPublicationDay   :: Int
                 , averageRating            :: Float
                 , bestBook                 :: Book
                 }
            deriving (Generic, Show)

data Book = Book { bID            :: Int
                 , bTitle         :: Text
                 , bAuthor        :: Author
                 , bImageURL      :: Maybe Text
                 , bSmallImageURL :: Maybe Text
                 }
            deriving (Generic, Show)


data Author = Author { authorID   :: Int
                     , authorName :: Text
                     }
              deriving (Generic, Show)


data GoodreadsResponse = GoodreadsResponse { request :: GRequest
                                           , search  :: GSearch
                                           }
                         deriving (Generic, Show)



main :: IO ()
main = do
  x <- simpleHttp apiString :: IO ByteString -- apiString is the API URL
  let listOfElements = onlyElems $ parseXML x
      filteredElements = concatMap (findElements (simpleName "work")) listOfElements
      simpleName s = QName s Nothing Nothing
  pPrint $ filteredElements

最终,我想做的是将<work></work>各个方面(来自<results> .. </results> )放入haskell可行的类型中。

但是我不确定该怎么做。 我正在使用xml包将其解析为默认类型。 但是不知道如何将其放入我的自定义类型。

您似乎可以在此处找到要进行模式匹配的最相关类型 也就是说你要取[Content]的结果是在parseXML从功能Text.XML.Light.Input回报和模式匹配每个单独的Content例如,大多忽略CRef数据构造和闷头Elem是因为这些都是您关心的XML标记(除了Text构造函数之外, Text构造函数包含在XML标记内找到的非XML字符串)。

例如,您需要执行以下操作:

#!/usr/bin/env stack
-- stack --resolver lts-12.24 --install-ghc runghc --package xml
import Text.XML.Light
import Data.Maybe

data MyXML =
    MyXML String [MyXML] -- Nested XML elements
  | Leaf  String         -- Leaves in the XML doc
  | Unit
  deriving (Show)

c2type :: Content -> Maybe MyXML
c2type (Text s) = Just $ Leaf $ cdData s
c2type (CRef _) = Nothing
c2type (Elem e) = Just $ MyXML (qName $ elName e) (mapMaybe c2type (elContent e))

main :: IO ()
main = do
  dat <- readFile "input.xml"
  let xml = parseXML dat
--  print xml
  print $ mapMaybe c2type xml

对于上面的代码段,说input.xml包含以下XML:

<work>
  <a>1</a>
  <b>2</b>
</work>

然后运行示例将产生:

$ ./xml.hs 
[MyXML "work" [Leaf "\n  ",MyXML "a" [Leaf "1"],Leaf "\n  ",MyXML "b" [Leaf "2"],Leaf "\n"],Leaf "\n"]

对于更广泛的用例,您可能会发现最有趣的功能可能包括:

(qName . elName) -- Get the name of a tag in String format from an Elem
elContent -- Recursively extract the XML tag contents of an Elem
elAttribs -- Can check those 'type' attributes on some of your tags

为了看一下XML解析器为您的代码返回的数据类型的一般结构,我强烈建议例如取消注释上面代码示例中的print xml行,并检查它在命令行中显示的内容列表。 。 仅此一项就可以准确告诉您您关心的领域。 例如,这是我最小限度的XML输入示例的内容:

[Elem (Element {elName = QName {qName = "work", qURI = Nothing, qPrefix = Nothing}, elAttribs = [], elContent = [Text (CData {cdVerbatim = CDataText, cdData = "\n  ", cdLine = Just 1}),Elem (Element {elName = QName {qName = "a", qURI = Nothing, qPrefix = Nothing}, elAttribs = [], elContent = [Text (CData {cdVerbatim = CDataText, cdData = "1", cdLine = Just 2})], elLine = Just 2}),Text (CData {cdVerbatim = CDataText, cdData = "\n  ", cdLine = Just 2}),Elem (Element {elName = QName {qName = "b", qURI = Nothing, qPrefix = Nothing}, elAttribs = [], elContent = [Text (CData {cdVerbatim = CDataText, cdData = "2", cdLine = Just 3})], elLine = Just 3}),Text (CData {cdVerbatim = CDataText, cdData = "\n", cdLine = Just 3})], elLine = Just 1}),Text (CData {cdVerbatim = CDataText, cdData = "\n", cdLine = Just 4})]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM