简体   繁体   中英

Golang - race condition using go-routine

I tried to use the race flag to my program and issue found:(

The func is the following

func (g *WaitingErrorGroup) Start(run func() (bool, error)) {
    g.g.Start(func() {
        requeue, err := run()
        if g.err == nil {
            g.err = err
        }
        if requeue {
            g.requeue = requeue
        }
    })
}

The function is called like following

g.Start(func() (bool, error) {
    return install(vins, crObjectKey, releasePrefix, kFilePath, objectCli, dependenciesBroadcastingSchema, compStatus)
})

g.Start(func() (bool, error) {
    return false, uninstall(currentRelease, kFilePath, updateChartStatus)
})

The stack trace look like following

WARNING: DATA RACE
Read at 0x00c0001614a8 by goroutine 82:
  github.vs.sar/agm/coperator/components/tools.(*WaitingErrorGroup).Start.func1()
      /Users/github.vs.sar/agm/coperator/components/tools/waitgroup.go:27 +0x84
  k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
      /Users/i88893/go/pkg/mod/k8s.io/apimachinery@v0.22.4/pkg/util/wait/wait.go:73 +0x6d

The start function is this: (my code ) github.vs.sar/agm/coperator/components/tools.(*WaitingErrorGroup).Start.func1()

func (g *WaitingErrorGroup) Start(run func() (bool, error)) {
    g.g.Start(func() {
        requeue, err := run()
        if g.err == nil {
            g.err = err
        }
        if requeue {
            g.requeue = requeue
        }
    })
}

The second in the stack trace is this (not my code)

go/pkg/mod/k8s.io/apimachinery@v0.22.4/pkg/util/wait/wait.go:73 +0x6d // Start starts f in a new goroutine in the group.

func (g *Group) Start(f func()) {
    g.wg.Add(1)
    go func() {
        defer g.wg.Done()
        f()
    }()
}

I guess (not an expert in go) that this is related to the usage of g.err from multiple goroutines concurrently which isn't allowed. Same for writing g.requeue Any idea how to solve this?

Maybe I need to use https://pkg.go.dev/sync#RWMutex But not sure how...

UPDATE

I took @Danil suggestion (change the lock position) and change it like following added mutex in the struct and add lock in the function, does it make sense? Now when I run with race flag everything seems to be OK

type WaitingErrorGroup struct {
    g       *wait.Group
    mu      sync.Mutex
    err     error
    requeue bool
}

func (g *WaitingErrorGroup) Start(run func() (bool, error)) {
    g.g.Start(func() {
        g.mu.Lock()
        defer g.mu.Unlock()
        requeue, err := run()
        if g.err == nil {
            g.err = err
        }
        if requeue {
            g.requeue = requeue
        }
    })
}

You could use a channel to communicate the occurring errors and handle them.

For example, something like this.

func handleErrors(c chan error) *sync.WaitGroup {
    wg := sync.WaitGroup{}
    wg.Add(1)
    go func() {
        defer wg.Done()
        for err := range c {
            fmt.Println(err)
        }
    }()
    return &wg
}

func main() {
    c := make(chan error, 2)
    wg := sync.WaitGroup{}

    defer handleErrors(c).Wait()
    defer close(c)
    defer wg.Wait()

    wg.Add(2)
    go func() {
        defer wg.Done()
        c <- errors.New("error 1")
    }()
    go func() {
        defer wg.Done()
        c <- errors.New("error 2")
    }()

}

I think using channels is more idiomatic in go than other sync primitives like locks. Locks are harder to get right, and they can come with performance cost.

If one go routine has the lock, the other goroutines have to wait until the lock is released. So you are introducing a bottleneck in your concurrent execution. In the above example, this is solved by buffering the channel. Even if nothing has read the message yet, still both goroutines are able to pass their message in without being blocked.

Additionally, it can happen that when using a lock, the lock is never released, for example, if the programmer forgot to add the relevant line, leading to a deadlock situation. Although similar bad things can happen when channels are not closed.

The problem appears because you try to manipulate not synchronized shared memory ( g.err in your case) from different goroutines.

To resolve this error you need to synchronize access to g.err .

You can use sync.Mutex and sync.RWMutex

In your case you will have:

func (g *WaitingErrorGroup) Start(run func() (bool, error)) {
    g.g.Start(func() {
        requeue, err := run()
        
        // Lock before reading and writing g.err and unlock after
        g.mu.Lock()
        defer g.mu.Unlock()
        
        if g.err == nil {
            g.err = err
        }
        if requeue {
            g.requeue = requeue
        }
    })
}

According to the suggestion to use here sync.WaitGroup - not sure that this will be the right option for you. sync.WaitGroup is used for synchronization of collection of goroutines.

A WaitGroup waits for a collection of goroutines to finish. The main goroutine calls Add to set the number of goroutines to wait for. Then each of the goroutines runs and calls Done when finished. At the same time, Wait can be used to block until all goroutines have finished.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM