[英]A specific use case of algebraic data types
I was writing an generic enumerator to scrape sites as an exercise and I did it, and it is complete and works fine, but I have a question. 我当时正在编写一个通用的枚举数来刮擦站点,但是我做到了,它是完整的并且可以正常工作,但是我有一个问题。 You can find it here: https://github.com/mindreader/scrape-enumerator if you want to look at the code.
您可以在这里找到它: https : //github.com/mindreader/scrape-enumerator如果要查看代码。
The basic idea is I wanted an enumerator that spits out site defined entries on pages like search engines, blogs, things where you have to fetch a page, and it will have 25 entries, and you want one entry at a time. 基本思想是我想要一个枚举器,在搜索引擎,博客等您必须获取页面的页面上吐出站点定义的条目,该条目将有25个条目,并且您一次想要一个条目。 But at the same time I didn't want to write the plumbing for every site, so I wanted a generic interface.
但是同时,我不想为每个站点都编写管道,所以我想要一个通用接口。 What I came up with is this (this uses type families):
我想到的是这个(它使用类型族):
class SiteEnum a where
type Result a :: *
urlSource :: a -> InputUrls (Int,Int)
enumResults :: a -> L.ByteString -> Maybe [Result a]
data InputUrls state =
UrlSet [URL] |
UrlFunc state (state -> (state,URL)) |
UrlPageDependent URL (L.ByteString -> Maybe URL)
In order to do this on every type of site, this requires a url source of some sort, which could be a list (possibly infinite) of pregenerated urls, or it could be an initial state and something to generate urls from it (like if the urls contained &page=1, &page=2, etc), and then for really screwed up pages like google, give an initial url and then provide a function that will search the body for the next link and then use that. 为了在每种类型的网站上执行此操作,这需要某种类型的url源,它可以是预生成的url的列表(可能是无限个),也可以是初始状态,并可以从中生成url(例如网址包含&page = 1,&page = 2等),然后为真正搞砸的网页(如google)提供一个初始网址,然后提供一个函数,该函数将在正文中搜索下一个链接,然后使用该链接。 Your site makes a data type an instance of SiteEnum and gives a type to Result which is site dependent and now the enumerator deals with all the I/O, and you don't have to think about it.
您的站点将数据类型作为SiteEnum的实例,并为Result赋予类型,该类型取决于站点,现在枚举器处理所有I / O,而您不必考虑它。 This works perfectly and I implemented one site with it.
这完美地工作了,我用它实现了一个站点。
My question is that there is an annoyance with this implementation is the InputUrls datatype. 我的问题是此实现的烦人之处在于InputUrls数据类型。 When I use UrlFunc everything is golden.
当我使用UrlFunc时,一切都是黄金。 When I use UrlSet or UrlPageDependent, it isn't all fun and games because the state type is undefined, and I have to cast it to :: InputUrls () in order for it to compile.
当我使用UrlSet或UrlPageDependent时,并不是所有的娱乐和游戏,因为状态类型是不确定的,因此必须将其强制转换为:: InputUrls()以便进行编译。 This seems totally unnecessary as that type variable due to the way the program is written, will never be used for the majority of sites, but I don't know how to get around it.
这似乎完全没有必要,因为由于程序的编写方式,该类型变量永远不会用于大多数站点,但是我不知道如何解决它。 I'm finding that I want to use types like this in a lot of different contexts, and I always end up with stray type variables that only are needed in certain pieces of the datatype, but it doesn't feel like I should be using it this way.
我发现我想在许多不同的上下文中使用这样的类型,并且我总是以杂散类型变量结尾,这些变量仅在数据类型的某些片段中才需要,但我觉得我不应该使用这样。 Is there a better way of doing this?
有更好的方法吗?
Why do you need the UrlFunc
case at all? 为什么根本需要
UrlFunc
案例? From what I understand, the only thing you're doing with the state function is using it to build a list like the one in UrlSet
anyway, so instead of storing the state function, just store the resulting list. 据我了解,状态函数唯一要做的就是使用它来构建一个类似
UrlSet
的列表的列表,因此,除了存储状态函数外,还只需存储结果列表即可。 That way, you can eliminate the state
type variable from your data type, which should eliminate the ambiguity problems. 这样,您可以从数据类型中消除
state
类型变量,从而消除歧义性问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.