简体   繁体   中英

How to remove duplicates strings or int from Slice in Go

Let's say I have a list of student cities and the size of it could be 100 or 1000, and I want to filter out all duplicates cities.

I want a generic solution that I can use to remove all duplicate strings from any slice.

I am new to Go Language, So I tried to do it by looping and checking if the element exists using another loop function.

Students' Cities List (Data):

studentsCities := []string{"Mumbai", "Delhi", "Ahmedabad", "Mumbai", "Bangalore", "Delhi", "Kolkata", "Pune"}

Functions that I created, and it's doing the job:

func contains(s []string, e string) bool {
    for _, a := range s {
        if a == e {
            return true
        }
    }
    return false
}

func removeDuplicates(strList []string) []string {
    list := []string{}
    for _, item := range strList {
        fmt.Println(item)
        if contains(list, item) == false {
            list = append(list, item)
        }
    }
    return list
}

My solution test

func main() {
    studentsCities := []string{"Mumbai", "Delhi", "Ahmedabad", "Mumbai", "Bangalore", "Delhi", "Kolkata", "Pune"}

    uniqueStudentsCities := removeDuplicates(studentsCities)
    
    fmt.Println(uniqueStudentsCities) // Expected output [Mumbai Delhi Ahmedabad Bangalore Kolkata Pune]
}

I believe that the above solution that I tried is not an optimum solution . Therefore, I need help from you guys to suggest the fastest way to remove duplicates from the slice?

I checked StackOverflow, this question is not being asked yet, so I didn't get any solution.

I found Burak's and Fazlan's solution helpful. Based on that, I implemented the simple functions that help to remove or filter duplicate data from slices of strings, integers, or any other types with generic approach.

Here are my three functions, first is generic, second one for strings and last one for integers of slices. You have to pass your data and return all the unique values as a result.

Generic solution: => Go v1.18

func removeDuplicate[T string | int](sliceList []T) []T {
    allKeys := make(map[T]bool)
    list := []T{}
    for _, item := range sliceList {
        if _, value := allKeys[item]; !value {
            allKeys[item] = true
            list = append(list, item)
        }
    }
    return list
}

To remove duplicate strings from slice:

func removeDuplicateStr(strSlice []string) []string {
    allKeys := make(map[string]bool)
    list := []string{}
    for _, item := range strSlice {
        if _, value := allKeys[item]; !value {
            allKeys[item] = true
            list = append(list, item)
        }
    }
    return list
}

To remove duplicate integers from slice:

func removeDuplicateInt(intSlice []int) []int {
    allKeys := make(map[int]bool)
    list := []int{}
    for _, item := range intSlice {
        if _, value := allKeys[item]; !value {
            allKeys[item] = true
            list = append(list, item)
        }
    }
    return list
}

You can update the slice type, and it will filter out all duplicates data for all types of slices.

Here is the GoPlayground link: https://go.dev/play/p/iyb97KcftMa

You can do in-place replacement guided with a map:

processed := map[string]struct{}{}
w := 0
for _, s := range cities {
    if _, exists := processed[s]; !exists {
        // If this city has not been seen yet, add it to the list
        processed[s] = struct{}{}
        cities[w] = s
        w++
    }
}
cities = cities[:w]

Adding this answer which worked for me, does require/include sorting, however.

func removeDuplicateStrings(s []string) []string {
    if len(s) < 1 {
        return s
    }

    sort.Strings(s)
    prev := 1
    for curr := 1; curr < len(s); curr++ {
        if s[curr-1] != s[curr] {
            s[prev] = s[curr]
            prev++
        }
    }

    return s[:prev]
}

For fun, I tried using generics. (Go 1.18+ only)

type SliceType interface {
    ~string | ~int | ~float64 // add more *comparable* types as needed
}

func removeDuplicates[T SliceType](s []T) []T {
    if len(s) < 1 {
        return s
    }

    // sort
    sort.SliceStable(s, func(i, j int) bool {
        return s[i] < s[j]
    })

    prev := 1
    for curr := 1; curr < len(s); curr++ {
        if s[curr-1] != s[curr] {
            s[prev] = s[curr]
            prev++
        }
    }

    return s[:prev]
}

Go Playground Link with tests: https://go.dev/play/p/bw1PP1osJJQ

Simple to understand.

func RemoveDuplicate(array []string) []string {
    m := make(map[string]string)
    for _, x := range array {
        m[x] = x
    }
    var ClearedArr []string
    for x, _ := range m {
        ClearedArr = append(ClearedArr, x)
    }
    return ClearedArr
}

reduce memory usage:

package main

import (
    "fmt"
    "reflect"
)

type void struct{}

func main() {

    digits := [6]string{"one", "two", "three", "four", "five", "five"}

    set := make(map[string]void)

    for _, element := range digits {
        set[element] = void{}
    }

    fmt.Println(reflect.ValueOf(set).MapKeys())
}

ps playground

If you want to don't waste memory allocating another array for copy the values, you can remove in place the value, as following:

package main

import "fmt"

var studentsCities = []string{"Mumbai", "Delhi", "Ahmedabad", "Mumbai", "Bangalore", "Delhi", "Kolkata", "Pune"}

func contains(s []string, e string) bool {
    for _, a := range s {
        if a == e {
            return true
        }
    }
    return false
}

func main() {
    fmt.Printf("Cities before remove: %+v\n", studentsCities)
    for i := 0; i < len(studentsCities); i++ {
        if contains(studentsCities[i+1:], studentsCities[i]) {
            studentsCities = remove(studentsCities, i)
            i--
        }
    }
    fmt.Printf("Cities after remove: %+v\n", studentsCities)
}
func remove(slice []string, s int) []string {
    return append(slice[:s], slice[s+1:]...)
}

Result:

Cities before remove: [Mumbai Delhi Ahmedabad Mumbai Bangalore Delhi Kolkata Pune]
Cities after remove: [Ahmedabad Mumbai Bangalore Delhi Kolkata Pune]
func UniqueNonEmptyElementsOf(s []string) []string {
    unique := make(map[string]bool, len(s))
    var us []string
    for _, elem := range s {
        if len(elem) != 0 {
            if !unique[elem] {
                us = append(us, elem)
                unique[elem] = true
            }
        }
    }

    return us
}

send the duplicated splice to the above function, this will return the splice with unique elements.

func main() {
    studentsCities := []string{"Mumbai", "Delhi", "Ahmedabad", "Mumbai", "Bangalore", "Delhi", "Kolkata", "Pune"}

    uniqueStudentsCities := UniqueNonEmptyElementsOf(studentsCities)
    
    fmt.Println(uniqueStudentsCities)
}

It can also be done with a set-like map:

ddpStrings := []string{}
m := map[string]struct{}{}

for _, s := range strings {
    if _, ok := m[scopeStr]; ok {
        continue
    }
    ddpStrings = append(ddpStrings, s)
    m[s] = struct{}{}
}

Here's a mapless index based slice's duplicate "remover"/trimmer. It use a sort method.

The n value is always 1 value lower than the total of non duplicate elements that's because this methods compare the current (consecutive/single) elements with the next (consecutive/single) elements and there is no matches after the lasts so you have to pad it to include the last.

Note that this snippet doesn't empty the duplicate elements into a nil value. However since the n+1 integer start at the duplicated item's indexes, you can loop from said integer and nil the rest of the elements.

sort.Strings(strs)
for n, i := 0, 0; ; {
    if strs[n] != strs[i] {
        if i-n > 1 {
            strs[n+1] = strs[i]
        }
        n++
    }
    i++
    if i == len(strs) {
        if n != i {
            strs = strs[:n+1]
        }
        break
    }
}
fmt.Println(strs)

Based on Riyaz's solution , you can use generics since Go 1.18

func removeDuplicate[T string | int](tSlice []T) []T {
    allKeys := make(map[T]bool)
    list := []T{}
    for _, item := range tSlice {
        if _, value := allKeys[item]; !value {
            allKeys[item] = true
            list = append(list, item)
        }
    }
    return list
}

Generics minimizes code duplication.

Go Playground link: https://go.dev/play/p/Y3fEtHJpP7Q

So far @snassr has given the best answer as it is the most optimized way in terms of memory (no extra memory) and runtime (nlogn). But one thing I want to emphasis here is if we want to delete any index/element of an array we should loop from end to start as it reduces complexity. If we loop from start to end then if we delete nth index then we will accidentally miss the nth element (which was n+1th before deleting nth element) as in the next iteration we will get the n+1th element.

Example Code

func Dedup(strs []string) {
    sort.Strings(strs)
    for i := len(strs) - 1; i > 0; i-- {
        if strs[i] == strs[i-1] {
            strs = append(strs[:i], strs[i+1:]...)
        }
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM