代数数据类型的特定用例

Question

I was writing an generic enumerator to scrape sites as an exercise and I did it, and it is complete and works fine, but I have a question. 我当时正在编写一个通用的枚举数来刮擦站点，但是我做到了，它是完整的并且可以正常工作，但是我有一个问题。 You can find it here: https://github.com/mindreader/scrape-enumerator if you want to look at the code. 您可以在这里找到它： https : //github.com/mindreader/scrape-enumerator如果要查看代码。

The basic idea is I wanted an enumerator that spits out site defined entries on pages like search engines, blogs, things where you have to fetch a page, and it will have 25 entries, and you want one entry at a time. 基本思想是我想要一个枚举器，在搜索引擎，博客等您必须获取页面的页面上吐出站点定义的条目，该条目将有25个条目，并且您一次想要一个条目。 But at the same time I didn't want to write the plumbing for every site, so I wanted a generic interface. 但是同时，我不想为每个站点都编写管道，所以我想要一个通用接口。 What I came up with is this (this uses type families): 我想到的是这个（它使用类型族）：

class SiteEnum a where
  type Result a :: *
  urlSource :: a -> InputUrls (Int,Int)
  enumResults :: a -> L.ByteString -> Maybe [Result a]

data InputUrls state =
  UrlSet [URL] |
  UrlFunc state (state -> (state,URL)) |
  UrlPageDependent URL (L.ByteString -> Maybe URL)

In order to do this on every type of site, this requires a url source of some sort, which could be a list (possibly infinite) of pregenerated urls, or it could be an initial state and something to generate urls from it (like if the urls contained &page=1, &page=2, etc), and then for really screwed up pages like google, give an initial url and then provide a function that will search the body for the next link and then use that. 为了在每种类型的网站上执行此操作，这需要某种类型的url源，它可以是预生成的url的列表（可能是无限个），也可以是初始状态，并可以从中生成url（例如网址包含＆page = 1，＆page = 2等），然后为真正搞砸的网页（如google）提供一个初始网址，然后提供一个函数，该函数将在正文中搜索下一个链接，然后使用该链接。 Your site makes a data type an instance of SiteEnum and gives a type to Result which is site dependent and now the enumerator deals with all the I/O, and you don't have to think about it. 您的站点将数据类型作为SiteEnum的实例，并为Result赋予类型，该类型取决于站点，现在枚举器处理所有I / O，而您不必考虑它。 This works perfectly and I implemented one site with it. 这完美地工作了，我用它实现了一个站点。

My question is that there is an annoyance with this implementation is the InputUrls datatype. 我的问题是此实现的烦人之处在于InputUrls数据类型。 When I use UrlFunc everything is golden. 当我使用UrlFunc时，一切都是黄金。 When I use UrlSet or UrlPageDependent, it isn't all fun and games because the state type is undefined, and I have to cast it to :: InputUrls () in order for it to compile. 当我使用UrlSet或UrlPageDependent时，并不是所有的娱乐和游戏，因为状态类型是不确定的，因此必须将其强制转换为:: InputUrls（）以便进行编译。 This seems totally unnecessary as that type variable due to the way the program is written, will never be used for the majority of sites, but I don't know how to get around it. 这似乎完全没有必要，因为由于程序的编写方式，该类型变量永远不会用于大多数站点，但是我不知道如何解决它。 I'm finding that I want to use types like this in a lot of different contexts, and I always end up with stray type variables that only are needed in certain pieces of the datatype, but it doesn't feel like I should be using it this way. 我发现我想在许多不同的上下文中使用这样的类型，并且我总是以杂散类型变量结尾，这些变量仅在数据类型的某些片段中才需要，但我觉得我不应该使用这样。 Is there a better way of doing this? 有更好的方法吗？

Answer 1

Why do you need the UrlFunc case at all? 为什么根本需要UrlFunc案例？ From what I understand, the only thing you're doing with the state function is using it to build a list like the one in UrlSet anyway, so instead of storing the state function, just store the resulting list. 据我了解，状态函数唯一要做的就是使用它来构建一个类似UrlSet的列表的列表，因此，除了存储状态函数外，还只需存储结果列表即可。 That way, you can eliminate the state type variable from your data type, which should eliminate the ambiguity problems. 这样，您可以从数据类型中消除state类型变量，从而消除歧义性问题。

代数数据类型的特定用例

问题描述

1 个解决方案

解决方案1
2 已采纳 2011-11-21 19:39:42

代数数据类型的特定用例

问题描述

1 个解决方案

解决方案1 2 已采纳 2011-11-21 19:39:42

解决方案1
2 已采纳 2011-11-21 19:39:42