简体   繁体   中英

Why does my code work correctly when I run wg.Wait() inside a goroutine?

I have a list of urls that I am scraping. What I want to do is store all of the successfully scraped page data into a channel, and when I am done, dump it into a slice. I don't know how many successful fetches I will get, so I cannot specify a fixed length. I expected the code to reach wg.Wait() and then wait until all the wg.Done() methods are called, but I never reached the close(queue) statement. Looking for a similar answer, I came across this SO answer

https://stackoverflow.com/a/31573574/5721702

where the author does something similar:

ports := make(chan string)
toScan := make(chan int)
var wg sync.WaitGroup

// make 100 workers for dialing
for i := 0; i < 100; i++ {
    wg.Add(1)
    go func() {
        defer wg.Done()
        for p := range toScan {
            ports <- worker(*host, p)
        }
    }()
}

// close our receiving ports channel once all workers are done
go func() {
    wg.Wait()
    close(ports)
}()

As soon as I wrapped my wg.Wait() inside the goroutine, close(queue) was reached:

urls := getListOfURLS()
activities := make([]Activity, 0, limit)
queue := make(chan Activity)
for i, activityURL := range urls {
    wg.Add(1)
    go func(i int, url string) {
        defer wg.Done()
        activity, err := extractDetail(url)
        if err != nil {
            log.Println(err)
            return
        }
        queue <- activity
    }(i, activityURL)
}
    // calling it like this without the goroutine causes the execution to hang
// wg.Wait() 
// close(queue)

    // calling it like this successfully waits
go func() {
    wg.Wait()
    close(queue)
}()
for a := range queue {
    // block channel until valid url is added to queue
    // once all are added, close it
    activities = append(activities, a)
}

Why does the code not reach the close if I don't use a goroutine for wg.Wait() ? I would think that the all of the defer wg.Done() statements are called so eventually it would clear up, because it gets to the wg.Wait() . Does it have to do with receiving values in my channel?

You need to wait for goroutines to finish in a separate thread because queue needs to be read from. When you do the following:

queue := make(chan Activity)
for i, activityURL := range urls {
    wg.Add(1)
    go func(i int, url string) {
        defer wg.Done()
        activity, err := extractDetail(url)
        if err != nil {
            log.Println(err)
            return
        }
        queue <- activity // nothing is reading data from queue.
    }(i, activityURL)
}

wg.Wait() 
close(queue)

for a := range queue {
    activities = append(activities, a)
}

Each goroutine blocks at queue <- activity since queue is unbuffered and nothing is reading data from it. This is because the range loop on queue is in the main thread after wg.Wait .

wg.Wait will only unblock once all the goroutine return. But as mentioned, all the goroutines are blocked at channel send.

When you use a separate goroutine to wait, code execution actually reaches the range loop on queue .

// wg.Wait does not block the main thread.
go func() {
    wg.Wait()
    close(queue)
}()

This results in the goroutines unblocking at the queue <- activity statement (main thread starts reading off queue ) and running until completion. Which in turn calls each individual wg.Done .

Once the waiting goroutine get past wg.Wait , queue is closed and the main thread exits the range loop on it.

queue channel is unbuffered so every goroutine trying to write to it gets blocked because reader process is not yet started. So no goroutinte can write and they all hang - as a result wg.Wait waits forever. Try to launch reader in a separate goroutine:

go func() {
    for a := range queue {
        // block channel until valid url is added to queue
        // once all are added, close it
       activities = append(activities, a)
    }
}()

and then start waiter:

wg.Wait() 
close(queue)

This way you can not to accumulate all the data in channel and overload it, but get data as it comes and put to target slice.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM