简体   繁体   中英

F# deleting common elements in lists

I'm trying to make a list with only one copy of each element of the original list.

For example [1;2;3;3;2] would be [1;2;3] or ["hi";"the";"world";"hi"] would be ["hi";"the";"world"]

I'm using recursion and pattern matching, and not using the list modules.

Here is my attempt and thinking: I want to go through the list and look at the head, and if that element exists in the tail of the list, then I want to take that element and then remove that element from the existing list

let rec common l =
  match l with
  | head :: tail -> if head = tail then head :: [] else head :: isolate(tail)
  | [] -> []

The first answer is very simple, but it uses AVL tree with O(log n) insert complexity and a lot of internal pointers allocations and high memory consumption per item:

let common l = l |> Set.ofList |> Set.toList

the timing results are below:

#time "on"
let mutable temp = Unchecked.defaultof<_>
for i = 0 to 1000000 do
  temp <- common [1;2;3;3;2;4;1;5;6;2;7;5;8;9;3;2;10]
  ()
Real: 00:00:03.328, CPU: 00:00:03.276, GC gen0: 826, gen1: 0, gen2: 0

And AVL tree is sorted, so this does not preserve the original order and returns sorted elements, eg

common [1;2;3;3;2;4;1;5;6;2;7;5;10;8;9;3;2]
val it : int list = [1; 2; 3; 4; 5; 6; 7; 8; 9; 10]

SCG.HashSet is an imperative collection with O(1) insert/lookup and less memory per item. It is the perfect data structure to keep a private track record of repeated values. Using it, one could write the common function as:

open System.Collections.Generic
let common (l:'T list) =
  let set = HashSet()
  let rec commonAux (input:'T list) (acc:'T list) : 'T list =
    match input with
    | head :: tail -> 
      if set.Add(head) then
        commonAux tail (head :: acc)
      else commonAux tail acc
    | [] -> acc
  commonAux l []
  |> List.rev

or even simpler:

let common (l:'T list) =
  let set = HashSet()
  List.fold (fun st t ->
    if set.Add(t) then t :: st
    else st
    ) [] l
  |> List.rev

The timings for the two are identical:

Real: 00:00:01.105, CPU: 00:00:01.092, GC gen0: 722, gen1: 1, gen2: 0
Real: 00:00:01.168, CPU: 00:00:01.170, GC gen0: 730, gen1: 0, gen2: 0

So using the List.fold with HashSet is very simple, fast and order-preserving. This is a good example when the ability to use private mutable state is F# blessing and is much faster compared to pure functional solutions, while the outer function remains "pure functional" with no side effects.

For completeness, we could implement the same fold logic using AVL set. It performs at the same speed as the first answer, is "pure functional" and keeps the original order:

let common (l:'T list) =
  let rec commonAux (input:'T list) (s) (acc:'T list) : 'T list =
    match input with
    | head :: tail -> 
      if Set.contains head s then commonAux tail s acc
      else 
        commonAux tail (Set.add head s) (head :: acc)
    | [] -> acc
  commonAux l Set.empty []
  |> List.rev
Real: 00:00:02.825, CPU: 00:00:02.808, GC gen0: 908, gen1: 1, gen2: 0

PS Using let common (l:'T list) = HashSet(l) |> List.ofSeq does not guarantee the order of elements and is c.2x slower than the fold solution.

PPS The timing for the second asnwer is:

Real: 00:00:07.504, CPU: 00:00:07.394, GC gen0: 1521, gen1: 1, gen2: 0

I would just convert to a set and back

let common l =
l |> Set.ofList |> Set.toList

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM