[英]Haskell - List comprehension with infinite lists
This is the piece of code 这是一段代码
primepowers n = foldr merge [] [ map (^i) primes | i <- [1..n] ] -- (1)
merge::(Ord t) =>[t]->[t]->[t]
merge x [] = x
merge [] y = y
merge (x:xs) (y:ys)
| x < y = x:merge xs (y:ys)
| otherwise = y:merge (x:xs) ys
which is equal to the mathematical expression {p^i | p is prime, 1 <= i <= n}
等于数学表达式
{p^i | p is prime, 1 <= i <= n}
{p^i | p is prime, 1 <= i <= n}
. {p^i | p is prime, 1 <= i <= n}
。
prime
returns an infinite list of prime numbers. prime
返回无限数量的素数列表。 What I am interested is in the evaluation of (1)
. 我感兴趣的是对
(1)
的评估。 These are my thoughts: 这些是我的想法:
If we first just look at [ map (^i) primes | i <- [1..3] ]
如果我们首先看看
[ map (^i) primes | i <- [1..3] ]
[ map (^i) primes | i <- [1..3] ]
this would return an infinite list of [[2,3,5,7,9,...],...]
. [ map (^i) primes | i <- [1..3] ]
将返回[[2,3,5,7,9,...],...]
的无限列表。 But as we know p^1
(p is prime) never ends, Haskell will never evaluate [p^2]
and [p^3]
. 但是我们知道
p^1
(p是质数)永远不会结束,Haskell永远不会评估[p^2]
和[p^3]
。 Is this just because it is an infinite list or because of lazy evaluation? 仅仅是因为它是一个无限的列表还是由于懒惰的评估?
Let's carry on with merge: merge will return [2,3,5,7,9,11,...]
because again we still have an infinite list or because of some other reason? 让我们继续合并:合并将返回
[2,3,5,7,9,11,...]
因为我们仍然有一个无限列表,或者由于其他原因?
Now to foldr
: foldr
starts evaluating from back. 现在到
foldr
: foldr
从后面开始评估。 Here with specifically ask for the rightmost element, which is a infinite list [p^3]
. 在这里,特别要求最右边的元素,它是一个无限列表
[p^3]
。 So the evaluation would be like this 所以评估会像这样
merge (merge (merge [] [p^3]) [p^2]) [p^1]
But we should not forget that these lists are infinite, so how does Haskell deal with that fact? 但是我们不应该忘记这些列表是无限的,那么Haskell如何处理这个事实呢?
Could anyone explain me the evaluation process of the above function? 谁能解释一下上述功能的评估过程?
The trick is to define it as 诀窍是将其定义为
primepowers n = foldr (\(x:xs) r-> x:merge xs r)
[] [ map (^i) primes | i <- [1..n] ]
(as seen in Richard Bird's code in the article O'Neill, Melissa E., "The Genuine Sieve of Eratosthenes" ). (如Richard Ord的代码在文章“ O'Neill,Melissa E.,“真正的Eratosthenes筛子”中所见))。
The lists to the right of a current one all start with bigger numbers, there's no chance of their merged list ever producing a value smaller or equal to the current list's head, so it can be produced unconditionally. 当前列表右边的列表均以较大的数字开头,合并列表产生的值不可能小于或等于当前列表的开头,因此可以无条件生成。
That way it will also explore only as many of the internal streams as needed: 这样,它也将仅探索所需的内部流:
GHCi> let pps_list = [ map (^i) primes | i <- [1..42] ]
GHCi> :sprint pps_list
pps_list = _
GHCi> take 20 $ foldr (\(x:xs) r-> x:merge xs r) [] pps_list
[2,3,4,5,7,8,9,11,13,16,17,19,23,25,27,29,31,32,37,41]
GHCi> :sprint pps_list
pps_list = (2 : 3 : 5 : 7 : 11 : 13 : 17 : 19 : 23 : 29 : 31 : 37 :
41 : _) :
(4 : 9 : 25 : 49 : _) : (8 : 27 : 125 : _) : (16 : 81 : _) :
(32 : 243 : _) : (64 : _) : _
To your question per se, foldr fz [a,b,c,...,n] = fa (fb (fc (... (fnz)...)))
so (writing ps_n
for map (^n) primes
), your expression is equivalent to 就您自己的问题而言,文件
foldr fz [a,b,c,...,n] = fa (fb (fc (... (fnz)...)))
所以(为map (^n) primes
写ps_n
map (^n) primes
),您的表达式等于
merge ps (merge ps_2 (merge ps_3 (... (merge ps_n [])...)))
= merge ps r
where r = merge ps_2 (merge ps_3 (... (merge ps_n [])...))
because you use merge
as your combining function. 因为您将
merge
用作合并功能。 Notice that the leftmost merge
springs into action first , while the expression for r
isn't even built yet (because its value wasn't yet needed - Haskell's evaluation is by need .) 请注意,最左边的
merge
首先开始起作用,而r
的表达式甚至还没有建立(因为还不需要其值-Haskell的评估是需要的 。)
Now, this merge
demands the head value of both its first and second argument (as written, it actually checks the second argument first, for being []
). 现在,此
merge
需要它的第一个和第二个参数的头值(如所写,它实际上首先检查第二个参数是否为[]
)。
The first argument isn't the problem, but the second is the result of folding all the rest of the lists ("r" in foldr
's combining function stands for "recursive result"). 第一个参数不是问题,但第二个参数是折叠其余所有列表的结果(
foldr
的combining函数中的“ r”代表“递归结果”)。 Thus, each element in the list will be visited and its head element forced - and all this just to produce one very first value, the head of the result list, by the leftmost merge
call... 因此,将访问列表中的每个元素,并强制其head元素-所有这一切仅是通过最左边的
merge
调用生成一个第一个值,即结果列表的head ...
In my code, the combining function does not at first demand the head of its second argument list. 在我的代码中,合并功能首先不要求其第二个参数列表的开头。 That's what limits its exploration of the whole list of lists, makes it more economic in its demands, and thus more productive (it will even work if you just omit the
n
altogether). 这就是限制其浏览整个列表的原因,使其在需求上更加经济,从而提高了生产率 (如果您完全省略
n
它甚至可以工作)。
Your example Haskell expression [ map (^i) primes | i <- [1..3] ]
您的示例Haskell表达式
[ map (^i) primes | i <- [1..3] ]
[ map (^i) primes | i <- [1..3] ]
returns finite list of length 3 , each element being an infinite list: [[2,3,5,7,11,...],[4,9,25,...],[8,27,125,...]]
so foldr
has no problem translating it into merge [2,3,5,7,11,...] (merge [4,9,25,...] (merge [8,27,125,..] []))
: [ map (^i) primes | i <- [1..3] ]
返回长度为3的 有限列表,每个元素为无限列表: [[2,3,5,7,11,...],[4,9,25,...],[8,27,125,...]]
所以foldr
没有问题翻译成merge [2,3,5,7,11,...] (merge [4,9,25,...] (merge [8,27,125,..] []))
:
foldr merge [] [ map (^i) primes |
folder merge [] [map(^ i)素数| i <- [1..3] ]
我<-[1..3]]
= merge [2,3,5,7,11,...] (foldr merge [] [ map (^i) primes | i <- [2..3] ])=合并[2,3,5,7,11,...] (文件夹合并[] [地图(^ i)素数| i <-[2..3]])
= merge [2,3,5,7,11,...] (merge [4,9,25,...] (foldr merge [] [ map (^i) primes | i <- [3..3] ]) )=合并[2,3,5,7,11,...](合并[4,9,25,...] (文件夹合并[] [映射(^ i)素数| i <-[3 .. 3]]) )
= merge [2,3,5,7,11,...] (merge [4,9,25,...] (merge [8,27,125,..] (foldr merge [] []) ))=合并[2,3,5,7,11,...](合并[4,9,25,...](合并[8,27,125,..] (文件夹合并[] []) )))
= merge [2,3,5,7,11,...] (merge [4,9,25,...] (merge [8,27,125,..][] ))
=合并[2,3,5,7,11,...](合并[4,9,25,...](合并[8,27,125,..] [] )))
= merge [2,3,5,7,11,...] (merge [4 ,9,25,...] [
8 ,27,125,..])
=合并[2,3,5,7,11,...](合并[4,9,25,...]
[8,27125,..])
= merge [2 ,3,5,7,11,...] (
4 :merge [9,25,...] [8,27,125,..])
=合并[2,3,5,7,11,...](4:合并[9,25,...] [8,27,125,..])
= 2:merge [3 ,5,7,11,...] (
4 :merge [9,25,...] [8,27,125,..])
= 2:合并[3,5,7,11,...](4:合并[9,25,...] [8,27,125,..])
= 2:3:merge [5 ,7,11,...] (
4 :merge [9,25,...] [8,27,125,..])
= 2:3:合并[5,7,11,...](4:合并[9,25,...] [8,27,125,..])
= 2:3:4:merge [5,7,11,...] (merge [9 ,25,...] [
8 ,27,125,..])
= 2:3:4:合并[5,7,11,...](合并[9,25,...]
[8,27125,..])
= 2:3:4:merge [5 ,7,11,...] (
8 :merge [9,25,...] [27,125,..])
= 2:3:4:合并[5,7,11,...](8:合并[9,25,...] [27125,..])
= 2:3:4:5:merge [7 ,11,...] (
8 :merge [9,25,...] [27,125,..])
= 2:3:4:5:合并[7,11,...](8:合并[9,25,...] [27125,..])
..........
As you can see, the rightmost inner list is examined first, because merge
is strict in (ie demands to know) both its arguments, as explained above. 如您所见,首先检查最右边的内部列表,因为
merge
在两个参数上都是严格的(即要求知道),如上所述。 For [ map (^i) primes | i <- [1..42] ]
对于
[ map (^i) primes | i <- [1..42] ]
[ map (^i) primes | i <- [1..42] ]
it would expand all 42 of them, and examine the heads of all of them, before producing even the head element of the result. [ map (^i) primes | i <- [1..42] ]
会展开所有42个元素,并检查所有元素的头部,然后再生成结果的head元素。
With the tweaked function, mg (x:xs) r = x:merge xs r
, the evaluation proceeds as 通过调整函数
mg (x:xs) r = x:merge xs r
,求值过程如下
foldr mg [] [ map (^i) primes |
folder mg [] [map(^ i)primes | i <- [1..3] ]
我<-[1..3]]
= mg [2 ,3,5,7,11,...] (foldr mg [] [ map (^i) primes | i <- [2..3] ])
=毫克[2,3,5,7,11,...](foldr相似毫克[] [图(^ I)引发(prime)| I < - [2..3]])
= 2:merge [3,5,7,11,...] (foldr mg [] [ map (^i) primes | i <- [2..3] ])= 2:合并[3,5,7,11,...] (文件夹mg [] [map(^ i)素数| i <-[2..3]])
= 2:merge [3,5,7,11,...] (mg [4 ,9,25,...] (foldr mg [] [ map (^i) primes | i <- [3..3] ]))
= 2:合并[3,5,7,11,...](毫克[4,9,25,...](foldr相似毫克[] [图(^ I)引发(prime)| I < - [3 .. 3]]))
= 2:merge [3 ,5,7,11,...] (
4 :merge [9,25,...] (foldr mg [] [ map (^i) primes | i <- [3..3] ]))
= 2:合并[3,5,7,11,...](4:合并[9,25,...](foldr相似毫克[] [图(^ I)引发(prime)| I < - [3 .. 3]]))
= 2:3:merge [5 ,7,11,...] (
4 :merge [9,25,...] (foldr mg [] [ map (^i) primes | i <- [3..3] ]))
= 2:3:合并[5,7,11,...](4:合并[9,25,...](foldr相似毫克[] [图(^ I)引发(prime)| I < - [3 .. 3]]))
= 2:3:4:merge [5,7,11,...] (merge [9,25,...] (foldr mg [] [ map (^i) primes | i <- [3..3] ]) )= 2:3:4:merge [5,7,11,...](合并[9,25,...] (文件夹mg [] [地图(^ i)素数| i <-[3 .. 3]]) )
= 2:3:4:merge [5,7,11,...] (merge [9,25,...] (mg [8 ,27,125,..] (foldr mg [] [])))
= 2:3:4:合并[5,7,11,...](合并[9,25,...](毫克[8,27125,..](foldr相似毫克[] [])))
= 2:3:4:merge [5,7,11,...] (merge [9 ,25,...] (
8 :merge [27,125,..] (foldr mg [] [])))
= 2:3:4:合并[5,7,11,...](合并[9,25,...](8:合并[27125,..](foldr相似毫克[] [])))
= 2:3:4:merge [5 ,7,11,...] (
8 :merge [9,25,...] (merge [27,125,..] (foldr mg [] [])))
= 2:3:4:合并[5,7,11,...](8:合并[9,25,...](合并[27125,..](foldr相似毫克[] [])))
= 2:3:4:5:merge [7 ,11,...] (
8 :merge [9,25,...] (merge [27,125,..] (foldr mg [] [])))
= 2:3:4:5:合并[7,11,...](8:合并[9,25,...](合并[27125,..](foldr相似毫克[] [])))
..........
so it starts producing the results much sooner, without expanding much of the inner lists. 因此它可以更快地开始产生结果,而无需扩展许多内部列表。 This just follows the definition of
foldr
, 这只是遵循
foldr
的定义,
foldr f z (x:xs) = f x (foldr f z xs)
where, because of the laziness, (foldr fz xs)
is not evaluated right away if f
does not demand its value (or a part of it, like its head ). 在这里,由于懒惰, 如果
f
不要求它(或它的一部分,如它的头部 )不要求它的值,则不会立即评估(foldr fz xs)
)。
The lists being merged are infinite, but that doesn't matter. 合并的列表是无限的,但这并不重要。
What matters is that you only have a finite number of lists being merged, and so to compute the next element of the merge you only need to perform a finite number of comparisons. 重要的是,您仅要合并有限数量的列表,因此要计算合并的下一个元素,您只需要执行有限数量的比较即可。
To compute the head of merge xs ys
you only need to compute the head of xs
and the head of ys
. 要计算
merge xs ys
的头,您只需要计算xs
的头和ys
的头。 So by induction, if you have a finite tree of merge
operations, you can compute the head of the overall merge in finite time. 因此,通过归纳,如果您具有有限的
merge
操作树,则可以在有限的时间内计算整体合并的头部。
It is true that merge
needs to completely scan its whole input lists to produce its whole output. 确实,
merge
需要完全扫描其整个输入列表以产生其整个输出。 However, the key point is that every element in the output depends only from finite prefixes of the input lists. 但是,关键是输出中的每个元素仅取决于输入列表的有限前缀 。
For instance, consider take 10 (map (*2) [1..])
. 例如,考虑
take 10 (map (*2) [1..])
。 To compute the first 10 elements, you do not need to examine the whole [1..]
. 要计算前10个元素,您无需检查整体
[1..]
。 Indeed, map
will not scan the whole infinite list and "after that" start returning the output: if it behaved like that, it would simply hang on infinite lists. 实际上,
map
不会扫描整个无限列表,并且“之后”将开始返回输出:如果它的行为如此,它将仅挂在无限列表上。 This "streaming" property of map
is given by laziness and the map
definition map
这种“流式”属性由懒惰和map
定义给出
map f [] = []
map f (x:xs) = x : map f xs
The last line reads "yield x, and then proceed with the rest", so the caller gets to inspect x
before map
produces its whole output. 最后一行显示“ yield x, 然后进行其余操作”,因此调用者可以在
map
产生其全部输出之前检查x
。 By comparison 通过对比
map f xs = go xs []
where go [] acc = acc
go (x:xs) acc = go xs (acc ++ [f x])
would be another definition of map
which would start generating its output only after its input has been consumed. 将是
map
另一种定义,它仅在消耗了输入后才开始生成其输出。 It is equivalent on finite lists (performance aside), but not equivalent on infinite ones (hangs on infinite lists). 它在有限列表上等效(不考虑性能),但在无限列表上等效(挂在无限列表上)。
If you want to empirically test that your merge
is indeed working lazily, try this: 如果您想凭经验测试
merge
确实很懒惰,请尝试以下操作:
take 10 $ merge (10:20:30:error "end of 1") (5:15:25:35:error "end of 2")
Feel free to play by changing the constants. 随时更改常量即可播放。 You will see an exception being printed on screen, but only after a few list elements have already been produced by
merge
. 您将看到在屏幕上打印出一个异常,但是只有在
merge
已经产生了一些列表元素之后。
[map (^i) primes | i <- [1..3]]
[map (^i) primes | i <- [1..3]]
returns just thunk
. [map (^i) primes | i <- [1..3]]
仅返回thunk
。 Nothing is evaluated for now. 目前尚无任何评估。 You could try this:
您可以尝试以下方法:
xs = [x | x <- [1..], error ""]
main = print $ const 0 xs
This program prints 0
, so error ""
wasn't evaluated here. 该程序输出
0
,因此这里未评估error ""
。
You can think about foldr
being defined like this: 您可以考虑像这样定义文件
foldr
:
foldr f z [] = z
foldr f z (x:xs) = f x (foldr f xs)
Then 然后
primepowers n = foldr merge [] [map (^i) primes | i <- [1..3]]
evaluates like this (after it was forced): 评估结果如下(强制执行后):
merge thunk1 (merge thunk2 (merge thunk3 []))
where thunkn
is a suspended computation of primes in n-th power. thunkn
是第n次幂的素数的悬浮计算。 Now the first merge
forces evaluation of thunk1
and merge thunk2 (merge thunk3 [])
, which are evaluated to weak head normal forms (whnf). 现在,第一个
merge
强制评估thunk1
和merge thunk2 (merge thunk3 [])
,它们的评估结果为弱头法线形式(whnf)。 Forcing merge thunk2 (merge thunk3 [])
causes forcing thunk2
and merge thunk3 []
. 强制
merge thunk2 (merge thunk3 [])
会导致强制thunk2
和merge thunk3 []
。 merge thunk3 []
reduces to thunk3
and then thunk3
is forced. merge thunk3 []
为thunk3
,然后强制thunk3
。 So the expression becomes 所以表达式变成
merge (2 : thunk1') (merge (4 : thunk2') (8 : thunk3'))
Which, due to the definition of merge, reduces to 由于合并的定义,其减少为
merge (2 : thunk1') (4 : merge thunk2' (8 : thunk3')
And again: 然后再次:
2 : merge thunk1' (4 : merge thunk2' (8 : thunk3')
Now merge
forces thunk1'
, but not the rest of the expression, because it's already in whnf 现在
merge
力thunk1'
,但不merge
表达式的其余部分,因为它已经在whnf中
2 : merge (3 : thunk1'') (4 : merge thunk2' (8 : thunk3)
2 : 3 : merge thunk1'' (4 : merge thunk2' (8 : thunk3')
2 : 3 : merge (5 : thunk1''') (4 : merge thunk2' (8 : thunk3')
2 : 3 : 4 : merge (5 : thunk1''') (merge thunk2' (8 : thunk3')
2 : 3 : 4 : merge (5 : thunk1''') (merge (9 : thunk2'') (8 : thunk3')
2 : 3 : 4 : merge (5 : thunk1''') (8 : merge (9 : thunk2'') thunk3')
2 : 3 : 4 : 5 : merge thunk1''' (8 : merge (9 : thunk2'') thunk3')
...
Intuitively, only those values become evaluated, that are needed. 直观地,只有那些需要的值才被评估。 Read this for a better explanation.
阅读本文以获得更好的解释。
You can also merge infinite list of infinite lists. 您还可以合并无限列表的无限列表。 The simplest way would be:
最简单的方法是:
interleave (x:xs) ys = x : interleave ys xs
primepowers = foldr1 interleave [map (^i) primes | i <- [1..]]
The interleave
function interleaves two infinite lists, for example, interleave [1,3..] [2,4..]
is equal to [1..]
. interleave
功能交错两个无限列表,例如, interleave [1,3..] [2,4..]
等于[1..]
。 So take 20 primepowers
gives you [2,4,3,8,5,9,7,16,11,25,13,27,17,49,19,32,23,121,29,125]
. 因此,
take 20 primepowers
给你[2,4,3,8,5,9,7,16,11,25,13,27,17,49,19,32,23,121,29,125]
。 But this list is unordered, we can do better. 但是此列表是无序的,我们可以做得更好。
[map (^i) primes | i <- [1..]]
[map (^i) primes | i <- [1..]]
reduces to [map (^i) primes | i <- [1..]]
减小为
[[2,3,5...]
,[4,9,25...]
,[8,27,125...]
...
]
We have the precondition, that in every n-th list there are elements, that are smaller, than head of the (n+1)-th list. 我们具有先决条件,即在每个第n个列表中都有比第(n + 1)个列表的头要小的元素。 We can extract such elements from the first list (
2
and 3
are smaller than 4
), and now we have this: 我们可以从第一个列表中提取此类元素(
2
和3
小于4
),现在我们有了:
[[5,7,11...]
,[4,9,25...]
,[8,27,125...]
...
]
The precondition doesn't hold, so we must fix this and swap the first list and the second: 前提条件不成立,因此我们必须解决此问题并交换第一个列表和第二个列表:
[[4,9,25...]
,[5,7,11...]
,[8,27,125...]
...
]
Now we extract 4
and swap the first list and the second: 现在我们提取
4
并交换第一个列表和第二个列表:
[[5,7,11...]
,[9,25,49...]
,[8,27,125...]
...
]
But the precondition doesn't hold, since there are elements in the second list ( 9
), that are not smaller than the head of the third list ( 8
). 但是前提条件不成立,因为第二个列表(
9
)中有不少于第三个列表( 8
)头的元素。 So we do the same trick again: 因此,我们再次执行相同的技巧:
[[5,7,11...]
,[8,27,125...]
,[9,25,49...]
...
]
And now we can extract elements again. 现在我们可以再次提取元素。 Repeating the process infinitely gives us ordered list of prime powers.
无限地重复该过程,可以得到有序的主要力量列表。 Here is the code:
这是代码:
swap xs@(x:_) xss = xss1 ++ xs : xss2 where
(xss1, xss2) = span ((< x) . head) xss
mergeAll (xs:xss@((x:_):_)) = xs1 ++ mergeAll (swap xs2 xss) where
(xs1, xs2) = span (< x) xs
primepowers = mergeAll [map (^i) primes | i <- [1..]]
For example, take 20 primepowers
is equal to [2,3,4,5,7,8,9,11,13,16,17,19,23,25,27,29,31,32,37,41]
. 例如,
take 20 primepowers
数等于[2,3,4,5,7,8,9,11,13,16,17,19,23,25,27,29,31,32,37,41]
。
This is probably not the nicest way to obtaining ordered list of prime powers, but it's fairly easy one. 这可能不是获得有序力量的最佳列表的最佳方法,但这是相当容易的一种。
EDIT 编辑
Look at the Will Ness ' answer for a better solution, which is both easier and nicer. 查看Will Ness的答案以获得更好的解决方案,该解决方案既容易又好。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.