Combining multiple maps that are stored on channel (Same key's values get summed.) in Go

Question

My objective is to create a program that counts every unique word's occurrence in a text file in a parallellised fashion, all occurrences have to be presented in a single map.

What I do here is dividing the textfile into string and then to an array. That array is then divided into two slices of equal length and fed concurrently to the mapper function.

   func WordCount(text string)  (map[string]int) {
    wg := new(sync.WaitGroup)
    s := strings.Fields(newText)

    freq := make(map[string]int,len(s))
    channel := make(chan map[string]int,2)

    wg.Add(1)
    go mappers(s[0:(len(s)/2)], freq, channel,wg)
    wg.Add(1)
    go mappers(s[(len(s)/2):], freq, channel,wg)
    wg.Wait()

    actualMap := <-channel


    return actualMap

func mappers(slice []string, occurrences map[string]int, ch chan map[string]int, wg *sync.WaitGroup)  {
    var l = sync.Mutex{}
    for _, word := range slice {
        l.Lock()
        occurrences[word]++
        l.Unlock()

    }
    ch <- occurrences
    wg.Done()
}

The bottom line is, is that I get a huge multiline error that starts with

fatal error: concurrent map writes

When I run the code. Which I thought I guarded for through mutual exclusion

        l.Lock()
        occurrences[word]++
        l.Unlock()

What am I doing wrong here? And furthermore. How can I combine all the maps in a channel? And with combine I mean same key's values get summed in the new map.

Answer 1

The main problem is that you use a separate lock in each goroutine. That doesn't do any help to serialize access to the map. The same lock has to be used in each goroutine.

And since you use the same map in each goroutine, you don't have to merge them, and you don't need a channel to deliver the result.

Even if you use the same mutex in each goroutine, since you use a single map, this probably won't help in performance, the goroutines will have to compete with each other for the map's lock.

You should create a separate map in each goroutine, use that to count locally, and then deliver the result map on the channel. This might give you a performance boost.

But then you don't need a lock, since each goroutine will have its own map which it can read/write without a mutex.

But then you'll do have to deliver the result on the channel, and then merge it.

And since goroutines deliver results on the channel, the waitgroup becomes unnecessary.

func WordCount(text string) map[string]int {
    s := strings.Fields(text)

    channel := make(chan map[string]int, 2)

    go mappers(s[0:(len(s)/2)], channel)
    go mappers(s[(len(s)/2):], channel)

    total := map[string]int{}
    for i := 0; i < 2; i++ {
        m := <-channel
        for k, v := range m {
            total[k] += v
        }
    }

    return total
}

func mappers(slice []string, ch chan map[string]int) {
    occurrences := map[string]int{}
    for _, word := range slice {
        occurrences[word]++

    }
    ch <- occurrences
}

Example testing it:

fmt.Println(WordCount("aa ab cd cd de ef a x cd aa"))

Output (try it on the Go Playground ):

map[a:1 aa:2 ab:1 cd:3 de:1 ef:1 x:1]

Also note that in theory this looks "good", but in practice you may still not achieve any performance boost, as the goroutines do too "little" work, and launching them and merging the results requires effort which may outweight the benefits.

Combining multiple maps that are stored on channel (Same key's values get summed.) in Go

Question

1 answers

solution1
1 ACCPTED 2020-04-02 12:22:21

Combining multiple maps that are stored on channel (Same key's values get summed.) in Go

Question

1 answers

solution1 1 ACCPTED 2020-04-02 12:22:21

solution1
1 ACCPTED 2020-04-02 12:22:21