简体   繁体   English

Haskell-具有无限列表的列表理解

[英]Haskell - List comprehension with infinite lists

This is the piece of code 这是一段代码

primepowers n = foldr merge [] [ map (^i) primes | i <- [1..n] ]  -- (1)

merge::(Ord t) =>[t]->[t]->[t]
merge x [] = x
merge [] y = y
merge (x:xs) (y:ys)
    | x < y = x:merge xs (y:ys)
    | otherwise = y:merge (x:xs) ys

which is equal to the mathematical expression {p^i | p is prime, 1 <= i <= n} 等于数学表达式{p^i | p is prime, 1 <= i <= n} {p^i | p is prime, 1 <= i <= n} . {p^i | p is prime, 1 <= i <= n}

prime returns an infinite list of prime numbers. prime返回无限数量的素数列表。 What I am interested is in the evaluation of (1) . 我感兴趣的是对(1)的评估。 These are my thoughts: 这些是我的想法:

If we first just look at [ map (^i) primes | i <- [1..3] ] 如果我们首先看看[ map (^i) primes | i <- [1..3] ] [ map (^i) primes | i <- [1..3] ] this would return an infinite list of [[2,3,5,7,9,...],...] . [ map (^i) primes | i <- [1..3] ]将返回[[2,3,5,7,9,...],...]的无限列表。 But as we know p^1 (p is prime) never ends, Haskell will never evaluate [p^2] and [p^3] . 但是我们知道p^1 (p是质数)永远不会结束,Haskell永远不会评估[p^2][p^3] Is this just because it is an infinite list or because of lazy evaluation? 仅仅是因为它是一个无限的列表还是由于懒惰的评估?

Let's carry on with merge: merge will return [2,3,5,7,9,11,...] because again we still have an infinite list or because of some other reason? 让我们继续合并:合并将返回[2,3,5,7,9,11,...]因为我们仍然有一个无限列表,或者由于其他原因?

Now to foldr : foldr starts evaluating from back. 现在到foldrfoldr从后面开始评估。 Here with specifically ask for the rightmost element, which is a infinite list [p^3] . 在这里,特别要求最右边的元素,它是一个无限列表[p^3] So the evaluation would be like this 所以评估会像这样

merge (merge (merge [] [p^3]) [p^2]) [p^1]

But we should not forget that these lists are infinite, so how does Haskell deal with that fact? 但是我们不应该忘记这些列表是无限的,那么Haskell如何处理这个事实呢?

Could anyone explain me the evaluation process of the above function? 谁能解释一下上述功能的评估过程?

The trick is to define it as 诀窍是将其定义为

primepowers n = foldr (\(x:xs) r-> x:merge xs r) 
                      [] [ map (^i) primes | i <- [1..n] ]

(as seen in Richard Bird's code in the article O'Neill, Melissa E., "The Genuine Sieve of Eratosthenes" ). (如Richard Ord的代码在文章“ O'Neill,Melissa E.,“真正的Eratosthenes筛子”中所见))。

The lists to the right of a current one all start with bigger numbers, there's no chance of their merged list ever producing a value smaller or equal to the current list's head, so it can be produced unconditionally. 当前列表右边的列表均以较大的数字开头,合并列表产生的值不可能小于或等于当前列表的开头,因此可以无条件生成。

That way it will also explore only as many of the internal streams as needed: 这样,它也将仅探索所需的内部流:

GHCi> let pps_list = [ map (^i) primes | i <- [1..42] ]
GHCi> :sprint pps_list
pps_list = _
GHCi> take 20 $ foldr (\(x:xs) r-> x:merge xs r) [] pps_list
[2,3,4,5,7,8,9,11,13,16,17,19,23,25,27,29,31,32,37,41]
GHCi> :sprint pps_list
pps_list = (2 : 3 : 5 : 7 : 11 : 13 : 17 : 19 : 23 : 29 : 31 : 37 :
            41 : _) :
           (4 : 9 : 25 : 49 : _) : (8 : 27 : 125 : _) : (16 : 81 : _) :
           (32 : 243 : _) : (64 : _) : _

To your question per se, foldr fz [a,b,c,...,n] = fa (fb (fc (... (fnz)...))) so (writing ps_n for map (^n) primes ), your expression is equivalent to 就您自己的问题而言,文件foldr fz [a,b,c,...,n] = fa (fb (fc (... (fnz)...)))所以(为map (^n) primesps_n map (^n) primes ),您的表达式等于

merge ps (merge ps_2 (merge ps_3 (... (merge ps_n [])...)))
= merge ps r
     where r = merge ps_2 (merge ps_3 (... (merge ps_n [])...))

because you use merge as your combining function. 因为您将merge用作合并功能。 Notice that the leftmost merge springs into action first , while the expression for r isn't even built yet (because its value wasn't yet needed - Haskell's evaluation is by need .) 请注意,最左边的merge 首先开始起作用,而r的表达式甚至还没有建立(因为还不需要其值-Haskell的评估是需要的 。)

Now, this merge demands the head value of both its first and second argument (as written, it actually checks the second argument first, for being [] ). 现在,此merge需要它的第一个和第二个参数的头值(如所写,它实际上首先检查第二个参数是否为[] )。

The first argument isn't the problem, but the second is the result of folding all the rest of the lists ("r" in foldr 's combining function stands for "recursive result"). 第一个参数不是问题,但第二个参数是折叠其余所有列表的结果( foldr的combining函数中的“ r”代表“递归结果”)。 Thus, each element in the list will be visited and its head element forced - and all this just to produce one very first value, the head of the result list, by the leftmost merge call... 因此,将访问列表中的每个元素,并强制其head元素-所有这一切仅是通过最左边的merge调用生成一个第一个值,即结果列表的head ...

In my code, the combining function does not at first demand the head of its second argument list. 在我的代码中,合并功能首先不要求其第二个参数列表的开头。 That's what limits its exploration of the whole list of lists, makes it more economic in its demands, and thus more productive (it will even work if you just omit the n altogether). 这就是限制其浏览整个列表的原因,使其在需求上更加经济,从而提高了生产率 (如果您完全省略n它甚至可以工作)。


Your example Haskell expression [ map (^i) primes | i <- [1..3] ] 您的示例Haskell表达式[ map (^i) primes | i <- [1..3] ] [ map (^i) primes | i <- [1..3] ] returns finite list of length 3 , each element being an infinite list: [[2,3,5,7,11,...],[4,9,25,...],[8,27,125,...]] so foldr has no problem translating it into merge [2,3,5,7,11,...] (merge [4,9,25,...] (merge [8,27,125,..] [])) : [ map (^i) primes | i <- [1..3] ]返回长度为3的 有限列表,每个元素为无限列表: [[2,3,5,7,11,...],[4,9,25,...],[8,27,125,...]]所以foldr没有问题翻译成merge [2,3,5,7,11,...] (merge [4,9,25,...] (merge [8,27,125,..] []))

foldr merge [] [ map (^i) primes | folder merge [] [map(^ i)素数| i <- [1..3] ] 我<-[1..3]]
= merge [2,3,5,7,11,...] (foldr merge [] [ map (^i) primes | i <- [2..3] ]) =合并[2,3,5,7,11,...] (文件夹合并[] [地图(^ i)素数| i <-[2..3]])
= merge [2,3,5,7,11,...] (merge [4,9,25,...] (foldr merge [] [ map (^i) primes | i <- [3..3] ]) ) =合并[2,3,5,7,11,...](合并[4,9,25,...] (文件合并[] [映射(^ i)素数| i <-[3 .. 3]])
= merge [2,3,5,7,11,...] (merge [4,9,25,...] (merge [8,27,125,..] (foldr merge [] []) )) =合并[2,3,5,7,11,...](合并[4,9,25,...](合并[8,27,125,..] (文件合并[] []) )))
= merge [2,3,5,7,11,...] (merge [4,9,25,...] (merge [8,27,125,..] [] )) =合并[2,3,5,7,11,...](合并[4,9,25,...](合并[8,27,125,..] [] )))
= merge [2,3,5,7,11,...] (merge [ 4 ,9,25,...] [ 8 ,27,125,..]) =合并[2,3,5,7,11,...](合并[4,9,25,...] [8,27125,..])
= merge [ 2 ,3,5,7,11,...] ( 4 :merge [9,25,...] [8,27,125,..]) =合并[2,3,5,7,11,...](4:合并[9,25,...] [8,27,125,..])
= 2:merge [ 3 ,5,7,11,...] ( 4 :merge [9,25,...] [8,27,125,..]) = 2:合并[3,5,7,11,...](4:合并[9,25,...] [8,27,125,..])
= 2:3:merge [ 5 ,7,11,...] ( 4 :merge [9,25,...] [8,27,125,..]) = 2:3:合并[5,7,11,...](4:合并[9,25,...] [8,27,125,..])
= 2:3:4:merge [5,7,11,...] (merge [ 9 ,25,...] [ 8 ,27,125,..]) = 2:3:4:合并[5,7,11,...](合并[9,25,...] [8,27125,..])
= 2:3:4:merge [ 5 ,7,11,...] ( 8 :merge [9,25,...] [27,125,..]) = 2:3:4:合并[5,7,11,...](8:合并[9,25,...] [27125,..])
= 2:3:4:5:merge [ 7 ,11,...] ( 8 :merge [9,25,...] [27,125,..]) = 2:3:4:5:合并[7,11,...](8:合并[9,25,...] [27125,..])
..... .....

As you can see, the rightmost inner list is examined first, because merge is strict in (ie demands to know) both its arguments, as explained above. 如您所见,首先检查最右边的内部列表,因为merge在两个参数上都是严格的(即要求知道),如上所述。 For [ map (^i) primes | i <- [1..42] ] 对于[ map (^i) primes | i <- [1..42] ] [ map (^i) primes | i <- [1..42] ] it would expand all 42 of them, and examine the heads of all of them, before producing even the head element of the result. [ map (^i) primes | i <- [1..42] ]会展开所有42个元素,并检查所有元素的头部,然后再生成结果的head元素。


With the tweaked function, mg (x:xs) r = x:merge xs r , the evaluation proceeds as 通过调整函数mg (x:xs) r = x:merge xs r ,求值过程如下

foldr mg [] [ map (^i) primes | folder mg [] [map(^ i)primes | i <- [1..3] ] 我<-[1..3]]
= mg [ 2 ,3,5,7,11,...] (foldr mg [] [ map (^i) primes | i <- [2..3] ]) =毫克[2,3,5,7,11,...](foldr相似毫克[] [图(^ I)引发(prime)| I < - [2..3]])
= 2:merge [3,5,7,11,...] (foldr mg [] [ map (^i) primes | i <- [2..3] ]) = 2:合并[3,5,7,11,...] (文件mg [] [map(^ i)素数| i <-[2..3]])
= 2:merge [3,5,7,11,...] (mg [ 4 ,9,25,...] (foldr mg [] [ map (^i) primes | i <- [3..3] ])) = 2:合并[3,5,7,11,...](毫克[4,9,25,...](foldr相似毫克[] [图(^ I)引发(prime)| I < - [3 .. 3]]))
= 2:merge [ 3 ,5,7,11,...] ( 4 :merge [9,25,...] (foldr mg [] [ map (^i) primes | i <- [3..3] ])) = 2:合并[3,5,7,11,...](4:合并[9,25,...](foldr相似毫克[] [图(^ I)引发(prime)| I < - [3 .. 3]]))
= 2:3:merge [ 5 ,7,11,...] ( 4 :merge [9,25,...] (foldr mg [] [ map (^i) primes | i <- [3..3] ])) = 2:3:合并[5,7,11,...](4:合并[9,25,...](foldr相似毫克[] [图(^ I)引发(prime)| I < - [3 .. 3]]))
= 2:3:4:merge [5,7,11,...] (merge [9,25,...] (foldr mg [] [ map (^i) primes | i <- [3..3] ]) ) = 2:3:4:merge [5,7,11,...](合并[9,25,...] (文件mg [] [地图(^ i)素数| i <-[3 .. 3]])
= 2:3:4:merge [5,7,11,...] (merge [9,25,...] (mg [ 8 ,27,125,..] (foldr mg [] []))) = 2:3:4:合并[5,7,11,...](合并[9,25,...](毫克[8,27125,..](foldr相似毫克[] [])))
= 2:3:4:merge [5,7,11,...] (merge [ 9 ,25,...] ( 8 :merge [27,125,..] (foldr mg [] []))) = 2:3:4:合并[5,7,11,...](合并[9,25,...](8:合并[27125,..](foldr相似毫克[] [])))
= 2:3:4:merge [ 5 ,7,11,...] ( 8 :merge [9,25,...] (merge [27,125,..] (foldr mg [] []))) = 2:3:4:合并[5,7,11,...](8:合并[9,25,...](合并[27125,..](foldr相似毫克[] [])))
= 2:3:4:5:merge [ 7 ,11,...] ( 8 :merge [9,25,...] (merge [27,125,..] (foldr mg [] []))) = 2:3:4:5:合并[7,11,...](8:合并[9,25,...](合并[27125,..](foldr相似毫克[] [])))
..... .....

so it starts producing the results much sooner, without expanding much of the inner lists. 因此它可以更快地开始产生结果,而无需扩展许多内部列表。 This just follows the definition of foldr , 这只是遵循foldr的定义,

foldr f z (x:xs) = f x (foldr f z xs)

where, because of the laziness, (foldr fz xs) is not evaluated right away if f does not demand its value (or a part of it, like its head ). 在这里,由于懒惰, 如果 f不要求它(或它的一部分,如它的头部 )不要求它的值,则不会立即评估(foldr fz xs) )。

The lists being merged are infinite, but that doesn't matter. 合并的列表是无限的,但这并不重要。

What matters is that you only have a finite number of lists being merged, and so to compute the next element of the merge you only need to perform a finite number of comparisons. 重要的是,您仅要合并有限数量的列表,因此要计算合并的下一个元素,您只需要执行有限数量的比较即可。

To compute the head of merge xs ys you only need to compute the head of xs and the head of ys . 要计算merge xs ys的头,您只需要计算xs的头和ys的头。 So by induction, if you have a finite tree of merge operations, you can compute the head of the overall merge in finite time. 因此,通过归纳,如果您具有有限的merge操作树,则可以在有限的时间内计算整体合并的头部。

It is true that merge needs to completely scan its whole input lists to produce its whole output. 确实, merge需要完全扫描其整个输入列表以产生其整个输出。 However, the key point is that every element in the output depends only from finite prefixes of the input lists. 但是,关键是输出中的每个元素仅取决于输入列表的有限前缀

For instance, consider take 10 (map (*2) [1..]) . 例如,考虑take 10 (map (*2) [1..]) To compute the first 10 elements, you do not need to examine the whole [1..] . 要计算前10个元素,您无需检查整体[1..] Indeed, map will not scan the whole infinite list and "after that" start returning the output: if it behaved like that, it would simply hang on infinite lists. 实际上, map不会扫描整个无限列表,并且“之后”将开始返回输出:如果它的行为如此,它将仅挂在无限列表上。 This "streaming" property of map is given by laziness and the map definition map这种“流式”属性由懒惰和map定义给出

map f [] = []
map f (x:xs) = x : map f xs

The last line reads "yield x, and then proceed with the rest", so the caller gets to inspect x before map produces its whole output. 最后一行显示“ yield x, 然后进行其余操作”,因此调用者可以在map产生其全部输出之前检查x By comparison 通过对比

map f xs = go xs []
  where go []     acc = acc
        go (x:xs) acc = go xs (acc ++ [f x])

would be another definition of map which would start generating its output only after its input has been consumed. 将是map另一种定义,它仅在消耗了输入后才开始生成其输出。 It is equivalent on finite lists (performance aside), but not equivalent on infinite ones (hangs on infinite lists). 它在有限列表上等效(不考虑性能),但在无限列表上等效(挂在无限列表上)。

If you want to empirically test that your merge is indeed working lazily, try this: 如果您想凭经验测试merge确实很懒惰,请尝试以下操作:

take 10 $ merge (10:20:30:error "end of 1") (5:15:25:35:error "end of 2")

Feel free to play by changing the constants. 随时更改常量即可播放。 You will see an exception being printed on screen, but only after a few list elements have already been produced by merge . 您将看到在屏幕上打印出一个异常,但是只有在merge已经产生了一些列表元素之后。

[map (^i) primes | i <- [1..3]] [map (^i) primes | i <- [1..3]] returns just thunk . [map (^i) primes | i <- [1..3]]仅返回thunk Nothing is evaluated for now. 目前尚无任何评估。 You could try this: 您可以尝试以下方法:

xs = [x | x <- [1..], error ""]

main = print $ const 0 xs

This program prints 0 , so error "" wasn't evaluated here. 该程序输出0 ,因此这里未评估error ""

You can think about foldr being defined like this: 您可以考虑像这样定义文件foldr

foldr f z  []    = z
foldr f z (x:xs) = f x (foldr f xs)

Then 然后

primepowers n = foldr merge [] [map (^i) primes | i <- [1..3]]

evaluates like this (after it was forced): 评估结果如下(强制执行后):

merge thunk1 (merge thunk2 (merge thunk3 []))

where thunkn is a suspended computation of primes in n-th power. thunkn是第n次幂的素数的悬浮计算。 Now the first merge forces evaluation of thunk1 and merge thunk2 (merge thunk3 []) , which are evaluated to weak head normal forms (whnf). 现在,第一个merge强制评估thunk1merge thunk2 (merge thunk3 []) ,它们的评估结果为弱头法线形式(whnf)。 Forcing merge thunk2 (merge thunk3 []) causes forcing thunk2 and merge thunk3 [] . 强制merge thunk2 (merge thunk3 [])会导致强制thunk2merge thunk3 [] merge thunk3 [] reduces to thunk3 and then thunk3 is forced. merge thunk3 []thunk3 ,然后强制thunk3 So the expression becomes 所以表达式变成

merge (2 : thunk1') (merge (4 : thunk2') (8 : thunk3'))

Which, due to the definition of merge, reduces to 由于合并的定义,其减少为

merge (2 : thunk1') (4 : merge thunk2' (8 : thunk3')

And again: 然后再次:

2 : merge thunk1' (4 : merge thunk2' (8 : thunk3')

Now merge forces thunk1' , but not the rest of the expression, because it's already in whnf 现在mergethunk1' ,但不merge表达式的其余部分,因为它已经在whnf中

2 : merge (3 : thunk1'') (4 : merge thunk2' (8 : thunk3)
2 : 3 : merge thunk1'' (4 : merge thunk2' (8 : thunk3')
2 : 3 : merge (5 : thunk1''') (4 : merge thunk2' (8 : thunk3')
2 : 3 : 4 : merge (5 : thunk1''') (merge thunk2' (8 : thunk3')
2 : 3 : 4 : merge (5 : thunk1''') (merge (9 : thunk2'') (8 : thunk3')
2 : 3 : 4 : merge (5 : thunk1''') (8 : merge (9 : thunk2'') thunk3')
2 : 3 : 4 : 5 : merge thunk1''' (8 : merge (9 : thunk2'') thunk3')
...

Intuitively, only those values become evaluated, that are needed. 直观地,只有那些需要的值才被评估。 Read this for a better explanation. 阅读本文以获得更好的解释。


You can also merge infinite list of infinite lists. 您还可以合并无限列表的无限列表。 The simplest way would be: 最简单的方法是:

interleave (x:xs) ys = x : interleave ys xs

primepowers = foldr1 interleave [map (^i) primes | i <- [1..]]

The interleave function interleaves two infinite lists, for example, interleave [1,3..] [2,4..] is equal to [1..] . interleave功能交错两个无限列表,例如, interleave [1,3..] [2,4..]等于[1..] So take 20 primepowers gives you [2,4,3,8,5,9,7,16,11,25,13,27,17,49,19,32,23,121,29,125] . 因此, take 20 primepowers给你[2,4,3,8,5,9,7,16,11,25,13,27,17,49,19,32,23,121,29,125] But this list is unordered, we can do better. 但是此列表是无序的,我们可以做得更好。

[map (^i) primes | i <- [1..]] [map (^i) primes | i <- [1..]] reduces to [map (^i) primes | i <- [1..]]减小为

[[2,3,5...]
,[4,9,25...]
,[8,27,125...]
...
]

We have the precondition, that in every n-th list there are elements, that are smaller, than head of the (n+1)-th list. 我们具有先决条件,即在每个第n个列表中都有比第(n + 1)个列表的头要小的元素。 We can extract such elements from the first list ( 2 and 3 are smaller than 4 ), and now we have this: 我们可以从第一个列表中提取此类元素( 23小于4 ),现在我们有了:

[[5,7,11...]
,[4,9,25...]
,[8,27,125...]
...
]

The precondition doesn't hold, so we must fix this and swap the first list and the second: 前提条件不成立,因此我们必须解决此问题并交换第一个列表和第二个列表:

[[4,9,25...]
,[5,7,11...]
,[8,27,125...]
...
]

Now we extract 4 and swap the first list and the second: 现在我们提取4并交换第一个列表和第二个列表:

[[5,7,11...]
,[9,25,49...]
,[8,27,125...]
...
]

But the precondition doesn't hold, since there are elements in the second list ( 9 ), that are not smaller than the head of the third list ( 8 ). 但是前提条件不成立,因为第二个列表( 9 )中有不少于第三个列表( 8 )头的元素。 So we do the same trick again: 因此,我们再次执行相同的技巧:

[[5,7,11...]
,[8,27,125...]
,[9,25,49...]
...
]

And now we can extract elements again. 现在我们可以再次提取元素。 Repeating the process infinitely gives us ordered list of prime powers. 无限地重复该过程,可以得到有序的主要力量列表。 Here is the code: 这是代码:

swap xs@(x:_) xss = xss1 ++ xs : xss2 where
    (xss1, xss2) = span ((< x) . head) xss 

mergeAll (xs:xss@((x:_):_)) = xs1 ++ mergeAll (swap xs2 xss) where
    (xs1, xs2) = span (< x) xs

primepowers = mergeAll [map (^i) primes | i <- [1..]]

For example, take 20 primepowers is equal to [2,3,4,5,7,8,9,11,13,16,17,19,23,25,27,29,31,32,37,41] . 例如, take 20 primepowers数等于[2,3,4,5,7,8,9,11,13,16,17,19,23,25,27,29,31,32,37,41]

This is probably not the nicest way to obtaining ordered list of prime powers, but it's fairly easy one. 这可能不是获得有序力量的最佳列表的最佳方法,但这是相当容易的一种。

EDIT 编辑

Look at the Will Ness ' answer for a better solution, which is both easier and nicer. 查看Will Ness的答案以获得更好的解决方案,该解决方案既容易又好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM