简体   繁体   English

如何从正在从该通道接收数据的goroutine向该通道添加对象?

[英]How do I add an object to a channel from a goroutine that is receiving data from that channel?

Basically, I am trying to write a concurrent sitemap crawler using goroutines. 基本上,我正在尝试使用goroutines编写并发站点地图搜寻器。 One sitemap can contain links to multiple sitemaps which can contain links to other sitemaps etc. 一个站点地图可以包含指向多个站点地图的链接,而这些链接可以包含指向其他站点地图的链接等。

Right now, this is my design: 现在,这是我的设计:

worker:
     - receives url from channel
     - processesUrl(url)
processUrl:
     for each link in lookup(url):
         - if link is sitemap:
                channel <- url
           else:
               print(url)
main:
    - create 10 workers
    - chanel <- root url

the problem is that the worker won't accept a new url from the channel until processUrl() is finished and processUrl won't finish until a worker accepts a new url from the channel if it is adding a url to the channel. 问题在于,工作人员在将processUrl()添加到通道之前,直到processUrl()完成后才从通道接受新的URL,而在工作人员从通道接受新URL之前,processUrl将不会完成。 What concurrent design can I use to add the url to a task queue without a channel and without busy-waiting or without waiting for channel <- url ? 我可以使用哪种并发设计将url添加到任务队列中,而无需使用通道,无需忙于等待或无需等待channel <- url

Here is the actual code if it helps: 这是实际的代码(如果有帮助的话):

func (c *SitemapCrawler) worker() {
    for {
        select {
        case url := <-urlChan:
            fmt.Println(url)
            c.crawlSitemap(url)
        }
    }
}
func crawlUrl(url string) {
    defer crawlWg.Done()
    crawler := NewCrawler(url)
    for i := 0; i < MaxCrawlRate*20; i++ {
        go crawler.worker()
    }
    crawler.getSitemaps()
    pretty.Println(crawler.sitemaps)
    crawler.crawlSitemaps()
}
func (c SitemapCrawler) crawlSitemap(url string) {
    c.limiter.Take()
    resp, err := MakeRequest(url)
    if err != nil || resp.StatusCode != 200 {
        crawlWg.Done()
        return
    }
    var resp_txt []byte
    if strings.Contains(resp.Header.Get("Content-Type"), "html") {
        crawlWg.Done()
        return
    } else if strings.Contains(url, ".gz") || resp.Header.Get("Content-Encoding") == "gzip" {
        reader, err := gzip.NewReader(resp.Body)
        if err != nil {
            crawlWg.Done()
            panic(err)
        } else {
            resp_txt, err = ioutil.ReadAll(reader)
            if err != nil {
                crawlWg.Done()
                panic(err)
            }
        }
        reader.Close()
    } else {
        resp_txt, err = ioutil.ReadAll(resp.Body)
        if err != nil {
            //panic(err)
            crawlWg.Done()
            return
        }
    }
    io.Copy(ioutil.Discard, resp.Body)
    resp.Body.Close()

    d, err := libxml2.ParseString(string(resp_txt))
    if err != nil {
        crawlWg.Done()
        return
    }
    results, err := d.Find("//*[contains(local-name(), 'loc')]")
    if err != nil {
        crawlWg.Done()
        return
    }
    locs := results.NodeList()
    printLock.Lock()
    for i := 0; i < len(locs); i++ {
        newUrl := locs[i].TextContent()
        if strings.Contains(newUrl, ".xml") {
            crawlWg.Add(1)
            //go c.crawlSitemap(newUrl)
            urlChan <- newUrl
        } else {
            fmt.Println(newUrl)
        }
    }
    printLock.Unlock()

    crawlWg.Done()
}

Write operations to channels are blocking when the channel is not buffered. 未缓冲通道时,对通道的写操作将阻塞。

To create a buffered channel: 要创建缓冲通道:

urlChan := make(chan string, len(allUrls))

When this channel is full however, write operations will block again. 但是,当该通道已满时,写操作将再次阻塞。

Alternatively you could use a switch. 或者,您可以使用开关。 When the write 'fails' it will immediately fall through to default 当写“失败”时,它将立即变为默认值

select {
case urlChan <- url:
    fmt.Println("received message")
default:
    fmt.Println("no activity")
}

To have a timeout on writing to the channel do the following 要在写入频道时超时,请执行以下操作

select {
case urlChan <- url:
    fmt.Println("received message")
case <-time.After(5 * time.Second):
    fmt.Println("timed out")
}

Or finally put the write event in a separate go channel 或者最终将write事件放在一个单独的go通道中

func write() {
    urlChan <- url
}

go write()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从goroutine的频道连续接收数据 - How to continuously receive data from channel of goroutine 有时 goroutine 不会从通道获取事件数据 - Sometimes goroutine NOT get event data from channel 从其他goroutine访问频道 - Accessing a channel from a different goroutine 为什么我在尝试从一个从不在 goroutine 中接收数据但在 main func 中接收数据的通道中读取时不会出现死锁 - Why do I not get a deadlock when trying to read from a channel that never receives data in a goroutine but do in the main func 来自 Goroutine 通道的随机结果,该通道接收指向底层 object 的指针 - Random results from a Goroutine channel which receives a pointer to an underlying object 为什么数据被推入通道但从未从接收器 goroutine 中读取? - Why is data being pushed into the channel but never read from the receiver goroutine? 我应该将请求对象传递给 goroutine 以阻止来自通道的 for-select 循环吗? - Should I pass request object to goroutine in blocking for-select loop coming from channel? 如果我们无法通过传递给该例程的频道进行监听,如何停止goroutine - How to stop a goroutine if we fail to listen from a channel passed to that routine 如何在主动从频道读取时检查 goroutine 的完成情况? - How to check for goroutine completion while actively reading from channel? 如何创建一个从goroutine接收多个返回值的通道 - How to make a channel that receive multiple return values from a goroutine
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM