簡體   English   中英

Goroutine 沒有按預期運行

[英]Goroutine didn't run as expected

我仍在學習 Go 並且正在練習 web 爬蟲,如鏈接所示 我實現的主要部分如下。 (其他部分保持不變,可以在鏈接中找到。)

// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher) {
    // TODO: Fetch URLs in parallel.
    // TODO: Don't fetch the same URL twice.
    // This implementation doesn't do either:
    if depth <= 0 {
        return
    }
    body, urls, err := fetcher.Fetch(url)
    cache.Set(url)
    if err != nil {
        fmt.Println(err)
        return
    }
    fmt.Printf("found: %s %q\n", url, body)

    for _, u := range urls {
        if cache.Get(u) == false {
            fmt.Println("Next:", u)
            Crawl(u, depth-1, fetcher) // I want to parallelize this
        }
    }
    return
}

func main() {
    Crawl("https://golang.org/", 4, fetcher)
}

type SafeCache struct {
    v   map[string]bool
    mux sync.Mutex
}

func (c *SafeCache) Set(key string) {
    c.mux.Lock()
    c.v[key] = true
    c.mux.Unlock()
}

func (c *SafeCache) Get(key string) bool {
    return c.v[key]
}

var cache SafeCache = SafeCache{v: make(map[string]bool)}

當我運行上面的代碼時,結果是預期的:

found: https://golang.org/ "The Go Programming Language"
Next: https://golang.org/pkg/
found: https://golang.org/pkg/ "Packages"
Next: https://golang.org/cmd/
not found: https://golang.org/cmd/
Next: https://golang.org/pkg/fmt/
found: https://golang.org/pkg/fmt/ "Package fmt"
Next: https://golang.org/pkg/os/
found: https://golang.org/pkg/os/ "Package os"

但是,當我嘗試通過將Crawl(u, depth-1, fetcher)更改為go Crawl(u, depth-1, fetcher) ) 來並行化爬蟲(在上面程序中的注釋行)時,結果不是正如我所料:

found: https://golang.org/ "The Go Programming Language"
Next: https://golang.org/pkg/
Next: https://golang.org/cmd/

我認為直接添加go關鍵字看起來很簡單,但我不確定出了什么問題,並且對如何最好地解決這個問題感到困惑。 任何意見,將不勝感激。 先感謝您!

您的程序很可能在爬蟲完成工作之前退出。 一種方法是讓Crawl有一個WaitGroup等待它的所有子爬蟲完成。 例如

import "sync"

// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher, *wg sync.WaitGroup) {
    defer func() {
        // If the crawler was given a wait group, signal that it's finished
        if wg != nil {
            wg.Done()
        }
    }()

    if depth <= 0 {
        return
    }

    _, urls, err := fetcher.Fetch(url)
    cache.Set(url)
    if err != nil {
        fmt.Println(err)
        return
    }

    fmt.Printf("found: %s %q\n", url, body)

    var crawlers sync.WaitGroup
    for _, u := range urls {
        if cache.Get(u) == false {
            fmt.Println("Next:", u)
            crawlers.Add(1)
            go Crawl(u, depth-1, fetcher, &crawlers)
        }
    }
    crawlers.Wait() // Waits for its sub-crawlers to finish

    return 
}

func main() {
   // The root does not need a WaitGroup
   Crawl("http://example.com/index.html", 4, nil)
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM