简体   繁体   中英

Understanding groupSort and groupOn functions in Data.List.Extra library

These are the definitions of groupSort and groupOn in Data.List.Extra module:

groupSort :: Ord k => [(k, v)] -> [(k, [v])]
groupSort = map (\x -> (fst $ head x, map snd x)) . groupOn fst . sortOn fst
groupOn :: Eq b => (a -> b) -> [a] -> [[a]]
groupOn f = groupBy ((==) `on2` f)
    -- redefine on so we avoid duplicate computation for most values.
    where (.*.) `on2` f = \x -> let fx = f x in \y -> fx .*. f y

I would like to know:

  • What's the meaning of the comma in (fst $ head x, map snd x) ?
  • What's the meaning of (.*.) inside the definition of groupOn ?
  • Why is function on redefined ? Why is duplicate computation avoided ?

What's the meaning of the comma in (fst $ head x, map snd x) ?

It's just a tuple, exactly the same as in something like (1, 2) . (It might be clearer to rewrite it as (fst (head x), map snd x) .)

What's the meaning of (.*.) inside the definition of groupOn ?

It's just a particularly clever (though not particularly readable) way of defining the first argument of on2 . That is, the following two definitions are equivalent:

(.*.) `on2` f = \x -> let fx = f x in \y -> fx .*. f y
g     `on2` f = \x -> let fx = f x in \y -> fx `g` (f y)

Why is function on redefined ? Why is duplicate computation avoided ?

I'm honestly not sure… I can see no particular advantages to redefining on here. (I'm not even sure what 'duplicate computation' they're talking about! If someone else has any idea why they're doing this, feel free to add a comment.)

What's the meaning of the comma in (fst $ head x, map snd x)

it constructs a 2-tuple, with fst (head x) as first item, and map snd x as second item. The lambda expression \\x -> (fst $ head x, map snd x) thus pas a list of values of the group to a 2-tuple where the first item is the first item of the first 2-tuple of the list, and the second item is a list of all the second item of the list of 2-tuples xs .

What's the meaning of (.*.) inside the definition of groupOn ?

It is the first operand of the on2 function that is defined in the where clause. One could have defined it with:

where on2  f = \x -> let fx = f x in \y ->  fx (f y)

Here we thus renamed the first parameter g , and since it is not an operator, we thus do not use infix notation like fx .*. fy fx .*. fy , but write g fx (fy) .

In case of the groupBy , then (.*.) is thus the same as (==) , and this thus means that (==) `on2` f is the same as:

 `on2` f = \x -> let fx = f x in \y -> fx  f y

It is thus a function that determines when two items are considered to be in the same group.

Why is function on redefined? Why is duplicate computation avoided?

It aims to avoid computing f on the first item multiple times. It does this once, and stores this in a variable fx to avoid recomputing it for each next element where we want to check if it belongs to same group.

The groupBy function is implemented as [src] :

groupBy                 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _  []           =  []
groupBy eq (x:xs)       =  (x:ys) : groupBy eq zs
                           where (ys,zs) = span () xs

Here it will thus calculate eq x to generate a function that matches the following elements. By calling eq x , it will thus call the function with the first item of the group that it is constructing, and the let statements will ensure that let fx = fx in … , this thus means that fx is no longer calculated.

Of course on2 will still evaluate fx for all the next elements that will be matched to the group. This thus means that if you are writing:

groupOn (+1) [1,1,1,1,1,2,2,2,2,2,1,1,1,1]

it will check the following elements of the list, we will evaluate this as fx :

             [1,        2,        1      ]

For the following elements we will evaluate this as fy :

             [  1,1,1,1,2,2,2,2,2,1,1,1,1]

as second operand. So we still evaluate f on each element of the list at least once, and for each start of a new group (except the first one) we will evaluate it twice.

For a list of n elements with n>0 that is divided in g groups, it will evaluate f n+g-1 times.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM