简体   繁体   English

Haskell-如何将XML响应解析为Haskell数据类型?

[英]Haskell - How to parse XML response into Haskell datatypes?

I'm a beginner trying to learn Haskell by doing some simple parsing problems. 我是一个初学者,尝试通过做一些简单的解析问题来学习Haskell。 I have this XML files. 我有这个XML文件。 It's a Goodreads' API response. 这是Goodreads的API响应。

<GoodreadsResponse>
    <Request>
        <authentication>true</authentication>
        <key>API_KEY</key>
        <method>search_search</method>
    </Request>
    <search>
        <query>fantasy</query>
        <results-start>1</results-start>
        <results-end>20</results-end>
        <total-results>53297</total-results>
        <source>Goodreads</source>
        <query-time-seconds>0.15</query-time-seconds>
        <results>
            <work>
                <id type="integer">4640799</id>
                <books_count type="integer">640</books_count>
                <ratings_count type="integer">5640935</ratings_count>
                <text_reviews_count type="integer">90100</text_reviews_count>
                <original_publication_year type="integer">1997</original_publication_year>
                <original_publication_month type="integer">6</original_publication_month>
                <original_publication_day type="integer">26</original_publication_day>
                <average_rating>4.46</average_rating>
                <best_book type="Book">
                    <id type="integer">3</id>
                    <title>Harry Potter and the Sorcerer's Stone (Harry Potter, #1)</title>
                    <author>
                        <id type="integer">1077326</id>
                        <name>J.K. Rowling</name>
                    </author>
                    <image_url>https://images.gr-assets.com/books/1474154022m/3.jpg</image_url>
                    <small_image_url>https://images.gr-assets.com/books/1474154022s/3.jpg</small_image_url>
                </best_book>
            </work>
              ...
              ...
              ...
              ...

This is what I've got so far 这就是我到目前为止

{-# LANGUAGE DeriveGeneric #-}

module Lib where

import           Data.ByteString.Lazy (ByteString)
import           Data.Text            (Text)
import           GHC.Generics         (Generic)
import           Network.HTTP.Conduit (simpleHttp)
import           Text.Pretty.Simple   (pPrint)
import           Text.XML.Light

data GRequest = GRequest { authentication :: Text
                         , key            :: Text
                         , method         :: Text
                         }
              deriving (Generic, Show)

data GSearch = GSearch { query              :: Text
                       , results_start      :: Int
                       , results_end        :: Int
                       , total_results      :: Int
                       , source             :: Text
                       , query_time_seconds :: Float
                       , search_results     :: GResults
                       }
             deriving (Generic, Show)

data GResults = GResults { results :: [Work] }
              deriving (Generic, Show)


data Work = Work { id                       :: Int
                 , booksCount               :: Int
                 , ratingsCount             :: Int
                 , text_reviewsCount        :: Int
                 , originalPublicationYear  :: Int
                 , originalPublicationMonth :: Int
                 , originalPublicationDay   :: Int
                 , averageRating            :: Float
                 , bestBook                 :: Book
                 }
            deriving (Generic, Show)

data Book = Book { bID            :: Int
                 , bTitle         :: Text
                 , bAuthor        :: Author
                 , bImageURL      :: Maybe Text
                 , bSmallImageURL :: Maybe Text
                 }
            deriving (Generic, Show)


data Author = Author { authorID   :: Int
                     , authorName :: Text
                     }
              deriving (Generic, Show)


data GoodreadsResponse = GoodreadsResponse { request :: GRequest
                                           , search  :: GSearch
                                           }
                         deriving (Generic, Show)



main :: IO ()
main = do
  x <- simpleHttp apiString :: IO ByteString -- apiString is the API URL
  let listOfElements = onlyElems $ parseXML x
      filteredElements = concatMap (findElements (simpleName "work")) listOfElements
      simpleName s = QName s Nothing Nothing
  pPrint $ filteredElements

Ultimately what I want to do is put every aspect of <work></work> (from <results> .. </results> ) into haskell workable types. 最终,我想做的是将<work></work>各个方面(来自<results> .. </results> )放入haskell可行的类型中。

But I'm not sure how to go about doing that. 但是我不确定该怎么做。 I'm using the xml package to parse it into default types. 我正在使用xml包将其解析为默认类型。 But don't know how to put that into my custom types. 但是不知道如何将其放入我的自定义类型。

It looks like the most pertinent types that you'll want to pattern match on can be found here. 您似乎可以在此处找到要进行模式匹配的最相关类型 Namely you'll want to take the [Content] results that the parseXML function from Text.XML.Light.Input returns and pattern match on each individual Content instance, mostly ignoring the CRef data constructor and instead focusing on Elem s because those are the XML tags that you care about (in addition to the Text constructors, which contain the non-XML strings found inside an XML tag). 也就是说你要取[Content]的结果是在parseXML从功能Text.XML.Light.Input回报和模式匹配每个单独的Content例如,大多忽略CRef数据构造和闷头Elem是因为这些都是您关心的XML标记(除了Text构造函数之外, Text构造函数包含在XML标记内找到的非XML字符串)。

For example you'll want to do something like the following: 例如,您需要执行以下操作:

#!/usr/bin/env stack
-- stack --resolver lts-12.24 --install-ghc runghc --package xml
import Text.XML.Light
import Data.Maybe

data MyXML =
    MyXML String [MyXML] -- Nested XML elements
  | Leaf  String         -- Leaves in the XML doc
  | Unit
  deriving (Show)

c2type :: Content -> Maybe MyXML
c2type (Text s) = Just $ Leaf $ cdData s
c2type (CRef _) = Nothing
c2type (Elem e) = Just $ MyXML (qName $ elName e) (mapMaybe c2type (elContent e))

main :: IO ()
main = do
  dat <- readFile "input.xml"
  let xml = parseXML dat
--  print xml
  print $ mapMaybe c2type xml

For the above code snippet, say input.xml contains the following XML: 对于上面的代码段,说input.xml包含以下XML:

<work>
  <a>1</a>
  <b>2</b>
</work>

Then running the example produces: 然后运行示例将产生:

$ ./xml.hs 
[MyXML "work" [Leaf "\n  ",MyXML "a" [Leaf "1"],Leaf "\n  ",MyXML "b" [Leaf "2"],Leaf "\n"],Leaf "\n"]

The functions you'll probably find most interesting for your more extensive use case will probably include: 对于更广泛的用例,您可能会发现最有趣的功能可能包括:

(qName . elName) -- Get the name of a tag in String format from an Elem
elContent -- Recursively extract the XML tag contents of an Elem
elAttribs -- Can check those 'type' attributes on some of your tags

In order to take a look at the general structure of the data types that the XML parser returns for your code, I strongly recommend eg uncommenting the print xml line in the code example above and inspecting the list of Contents it spits out on the command line. 为了看一下XML解析器为您的代码返回的数据类型的一般结构,我强烈建议例如取消注释上面代码示例中的print xml行,并检查它在命令行中显示的内容列表。 。 That alone should tell you exactly which fields you care about. 仅此一项就可以准确告诉您您关心的领域。 For example this is what you get for my more minimal XML input example: 例如,这是我最小限度的XML输入示例的内容:

[Elem (Element {elName = QName {qName = "work", qURI = Nothing, qPrefix = Nothing}, elAttribs = [], elContent = [Text (CData {cdVerbatim = CDataText, cdData = "\n  ", cdLine = Just 1}),Elem (Element {elName = QName {qName = "a", qURI = Nothing, qPrefix = Nothing}, elAttribs = [], elContent = [Text (CData {cdVerbatim = CDataText, cdData = "1", cdLine = Just 2})], elLine = Just 2}),Text (CData {cdVerbatim = CDataText, cdData = "\n  ", cdLine = Just 2}),Elem (Element {elName = QName {qName = "b", qURI = Nothing, qPrefix = Nothing}, elAttribs = [], elContent = [Text (CData {cdVerbatim = CDataText, cdData = "2", cdLine = Just 3})], elLine = Just 3}),Text (CData {cdVerbatim = CDataText, cdData = "\n", cdLine = Just 3})], elLine = Just 1}),Text (CData {cdVerbatim = CDataText, cdData = "\n", cdLine = Just 4})]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM