简体   繁体   English

优化haskell代码

[英]optimization of a haskell code

I write the following Haskell code which take a triplet (x,y,z) and a list of triplets [(Int,Int,Int)] and look if there is a triplet (a,b,c) in the list such that x == a and y == b if it is a case i just need to update c = c + z, if there is not a such of triplet in the list I just add the triplet in the list. 我编写了下面的Haskell代码,它采用三元组(x,y,z)和三元组列表[(Int,Int,Int)]并查看列表中是否存在三元组(a,b,c) x == a和y == b如果是这种情况我只需要更新c = c + z,如果列表中没有这样的三元组,我只需在列表中添加三元组。

-- insertEdge :: (Int,Int,Int) -> [(Int, Int, Int)] -> [(Int, Int, Int)]

insertEdge (x,y,z) cs = 

if (length [(a,b,c) | (a,b,c) <- cs, a /= x || b /= y]) == (length cs) 

 then ((x,y,z):cs)) 

   else [if (a == x && b == y) then (a,b,c+1) else (a,b,c) | (a,b,c) <- cs]

After profiling my code it appears that this fuction take 65% of the execution time. 在对我的代码进行分析后,看起来这个功能占用了65%的执行时间。

How can I re-write my code to be more efficient? 如何重新编写代码以提高效率?

Other answers are correct, so I want to offer some unasked-for advice instead: how about using Data.Map (Int,Int) Int instead of list? 其他答案是正确的,所以我想提供一些unasked-for建议:如何使用Data.Map(Int,Int)Int而不是list?

Then your function becomes insertWith (+) (a,b) c mymap 然后你的函数变成insertWith (+) (a,b) c mymap

The first thing that jumps out at me is the conditional: length examines the entire list, so in the worst-case scenario (updating the last element) your function traverses the list three times: Once for the length of the filtered list, once for the length of cs , and once to find the element to update. 跳出来的第一件事是条件: length检查整个列表,所以在最坏的情况下(更新最后一个元素),你的函数遍历列表三次:一次为过滤列表的长度,一次为cs的长度,一旦找到要更新的元素。

However, even getting rid of the extra traversals, the best you can do with the function as written will usually require a traversal of most of the list. 但是,即使摆脱了额外的遍历,使用函数编写的最佳方法通常也需要遍历大部分列表。 From the name of the function and how much time was being spent in it, I'm guessing you're calling this repeatedly to build up a data structure? 从函数的名称和花费了多少时间,我猜你是在反复调用它来构建数据结构? If so, you should strongly consider using a more efficient representation. 如果是这样,您应该强烈考虑使用更有效的表示。

For instance, a quick and easy improvement would be to use Data.Map , the first two elements of the triplet in a 2-tuple as the key, and the third element as the value. 例如,快速简单的改进是使用Data.Map ,将2元组中的三元组的前两个元素作为键,将第三个元素作为值。 That way you can avoid making so many linear-time lookups/redundant traversals. 这样您就可以避免进行如此多的线性时间查找/冗余遍历。

As a rule of thumb, lists in Haskell are only an appropriate data structure when all you do is either walk sequentially down the list a few times (ideally, just once) or add/remove from the head of the list (ie, using it like a stack). 根据经验,Haskell中的列表只是一个合适的数据结构,当你所做的只是在列表中顺序走几次(理想情况下,只是一次)或从列表的头部添加/删除(即,使用它像堆栈一样)。 If you're searching, filtering, updating elements in the middle, or--worst of all--indexing by position, using lists will only end in tears. 如果您正在搜索,过滤,更新中间的元素,或者 - 最糟糕的是 - 按位置编制索引,使用列表只会以泪流满面。


Here's a quick example, if that helps: 这是一个简单的例子,如果有帮助:

import qualified Data.Map as M

incEdge :: M.Map (Int, Int) Int -> ((Int, Int), Int) -> M.Map (Int, Int) Int
incEdge cs (k,v) = M.alter f k cs
    where f (Just n) = Just $ n + v
          f Nothing  = Just v

The alter function is just insert/update/delete all rolled into one. alter函数只是插入/更新/删除所有滚动到一个。 This inserts the key into the map if it's not there, and sums the values if the key does exist. 如果密钥不存在,则将密钥插入到映射中,如果密钥存在,则将值汇总。 To build up a structure incrementally, you can do something like foldl incEdge M.empty edgeList . 要逐步构建结构,您可以执行类似foldl incEdge M.empty edgeList Testing this out, for a few thousand random edges your version with a list takes several seconds, whereas the Data.Map version is pretty much immediate. 测试一下,对于几千个随机边缘,带有列表的版本需要几秒钟,而Data.Map版本则非常直接。

It's always a good idea to benchmark (and Criterion makes it so easy). 基准测试总是一个好主意(并且Criterion使它变得如此简单)。 Here are the results for the original solution ( insertEdgeO ), Geoff's foldr ( insertEdgeF ), and Data.Map ( insertEdgeM ): 以下是原始解决方案( insertEdgeO ),Geoff的foldrinsertEdgeF )和Data.MapinsertEdgeM )的结果:

benchmarking insertEdgeO...
mean: 380.5062 ms, lb 379.5357 ms, ub 381.1074 ms, ci 0.950

benchmarking insertEdgeF...
mean: 74.54564 ms, lb 74.40043 ms, ub 74.71190 ms, ci 0.950

benchmarking insertEdgeM...
mean: 18.12264 ms, lb 18.03029 ms, ub 18.21342 ms, ci 0.950

Here's the code (I compiled with -O2 ): 这是代码(我用-O2编译):

module Main where
import Criterion.Main
import Data.List (foldl')
import qualified Data.Map as M

insertEdgeO :: (Int, Int, Int) -> [(Int, Int, Int)] -> [(Int, Int, Int)]
insertEdgeO (x, y, z) cs =
  if length [(a, b, c) | (a, b, c) <- cs, a /= x || b /= y] == length cs
    then (x, y, z) : cs
    else [if (a == x && b == y) then (a, b, c + z) else (a, b, c) | (a, b, c) <- cs]

insertEdgeF :: (Int, Int, Int) -> [(Int, Int, Int)] -> [(Int, Int, Int)]
insertEdgeF (x,y,z) cs =
  case foldr f (False, []) cs of
    (False, cs') -> (x, y, z) : cs'
    (True, cs')  -> cs'
  where
    f (a, b, c) (e, cs')
      | (a, b) == (x, y) = (True, (a, b, c + z) : cs')
      | otherwise        = (e, (a, b, c) : cs')

insertEdgeM :: (Int, Int, Int) -> M.Map (Int, Int) Int -> M.Map (Int, Int) Int
insertEdgeM (a, b, c) = M.insertWith (+) (a, b) c

testSet n = [(a, b, c) | a <- [1..n], b <- [1..n], c <- [1..n]]

testO = foldl' (flip insertEdgeO) [] . testSet
testF = foldl' (flip insertEdgeF) [] . testSet
testM = triplify . M.toDescList . foldl' (flip insertEdgeM) M.empty . testSet
  where
    triplify = map (\((a, b), c) -> (a, b, c))

main = let n = 25 in defaultMain
  [ bench "insertEdgeO" $ nf testO n
  , bench "insertEdgeF" $ nf testF n
  , bench "insertEdgeM" $ nf testM n
  ]

You can improve insertEdgeF a bit by using foldl' (55.88634 ms), but Data.Map still wins. 您可以使用foldl' (55.88634 ms)稍微改进insertEdgeF ,但Data.Map仍然获胜。

The main reason your function is slow is that it traverses the list at least twice, maybe three times. 你的函数很慢的主要原因是它遍历列表至少两次,也就是三次。 The function can be rewritten to to traverse the list only once using a fold. 可以将该函数重写为仅使用折叠遍历列表一次。 This will transform the list into a tuple (Bool,[(Int,Int,Int)]) where the Bool indicates if there was a matching element in the list and the list is the transformed list 这会将列表转换为元组(Bool,[(Int,Int,Int)]),其中Bool指示列表中是否存在匹配元素,列表是转换列表

insertEdge (x,y,z) cs = case foldr f (False,[]) cs of
                          (False,cs') -> (x,y,z):cs'
                          (True,cs')  -> cs' 
  where f (a,b,c) (e,cs') = if (a,b) == (x,y) then (True,(a,b,c+z):cs') else (e,(a,b,c):cs')

If you haven't seen foldr before, it has type 如果您以前没有看过foldr,它有类型

foldr :: (a -> b -> b) -> b -> [a] -> b

foldr embodies a pattern of recursive list processing of defining a base case and combining the current list element with the result from the rest of the list. foldr体现了一种递归列表处理模式,用于定义基本案例并将当前列表元素与列表其余部分的结果相结合。 Writing foldr fb xs is the same as writing a function g with definition 编写foldr fb xs与使用定义编写函数g相同

g [] = b
g (x:xs) = f x (g xs)

Sticking with your data structure, you might 坚持使用您的数据结构,您可能会

type Edge = (Int,Int,Int)

insertEdge :: Edge -> [Edge] -> [Edge]
insertEdge t@(x,y,z) es =
  case break (abx t) es of
    (_, []) -> t : es
    (l, ((_,_,zold):r)) -> l ++ (x,y,z+zold) : r
  where abx (a1,b1,_) (a2,b2,_) = a1 == a2 && b1 == b2

No matter what language you're using, searching lists is always a red flag. 无论您使用何种语言,搜索列表始终是一个红旗。 When searching you want sublinear complexity (think: hashes, binary search trees, and so on). 在搜索时,您需要次线性复杂性(想想:哈希,二叉搜索树等)。 In Haskell, an implementation using Data.Map is 在Haskell中,使用Data.Map的实现是

import Data.Map

type Edge = (Int,Int,Int)

type EdgeMap = Map (Int,Int) Int
insertEdge :: Edge -> EdgeMap -> EdgeMap
insertEdge (x,y,z) es = alter accumz (x,y) es
  where accumz Nothing = Just z
        accumz (Just zold) = Just (z + zold)

You may not be familiar with alter : 你可能不熟悉alter

alter :: Ord k => (Maybe a -> Maybe a) -> k -> Map ka -> Map ka

O(log n) . O(log n) The expression (alter fk map) alters the value x at k , or absence thereof. 表达式(alter fk map)k或其不存在时改变值x alter can be used to insert, delete, or update a value in a Map . alter可用于插入,删除或更新Map的值。 In short: lookup k (alter fkm) = f (lookup km) . 简而言之: lookup k (alter fkm) = f (lookup km)

 let f _ = Nothing alter f 7 (fromList [(5,"a"), (3,"b")]) == fromList [(3, "b"), (5, "a")] alter f 5 (fromList [(5,"a"), (3,"b")]) == singleton 3 "b" let f _ = Just "c" alter f 7 (fromList [(5,"a"), (3,"b")]) == fromList [(3, "b"), (5, "a"), (7, "c")] alter f 5 (fromList [(5,"a"), (3,"b")]) == fromList [(3, "b"), (5, "c")] 

But as ADEpt shows in another answer , this is a bit of overengineering. 但正如ADEpt在另一个答案中所说 ,这有点过度工程。

In

insertEdgeM :: (Int, Int, Int) -> M.Map (Int, Int) Int -> M.Map (Int, Int) Int
insertEdgeM (a, b, c) = M.insertWith (+) (a, b) c

you want to use the strict version of insertWith , namely insertWith' . 你想使用insertWith的严格版本,即insertWith'

Very small optimisation: Use an as-pattern, this avoids multiple reconstructions of the same tuple. 非常小的优化:使用as-pattern,这避免了同一元组的多次重建。 Like this: 像这样:

insertEdge xyz@(x,y,z) cs =
  if (length [abc | abc@(a,b,c) <- cs, a /= x || b /= y]) == (length cs) 
    then (xyz:cs)) 
    else [if (a == x && b == y) then (a,b,c+1) else abc' | abc'@(a,b,c) <- cs]

You should apply the other optimization hionts first, but this may save a very small amount of time, since the tuple doesn't have to be reconstructed again and again. 您应该首先应用其他优化hionts,但这可以节省很少的时间,因为元组不必一次又一次地重建。 At least in the last at-pattern (The first two patterns are not important, since the tuple never gets evaluated in the first case and the as-pattern is only applied once in the second case). 至少在最后的at-pattern中(前两个模式并不重要,因为在第一种情况下永远不会对元组进行求值,而在第二种情况下只应用as模式一次)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM