[英]golang: how to close the channel after all goroutines are finished?
I would like to write a simple web scraper in Go by: 我想在Go by中编写一个简单的网络抓取工具:
Here's my code: 这是我的代码:
package main
import (
"encoding/csv"
"flag"
"fmt"
"github.com/PuerkitoBio/goquery"
"log"
"net/http"
"net/url"
"os"
"strings"
"sync"
)
type Enterprise struct {
name string
tax_code string
group string
capital string
}
var u, f string
var name, tax_code, group, capital string
func init() {
flag.StringVar(&u, "u", "", "Which URL to download from")
flag.StringVar(&f, "f", "", "Path to the csv file to write the output to")
}
func check(e error) {
if e != nil {
panic(e)
}
}
func findHrefs(u string) map[string]string {
resp, err := http.Get(u)
check(err)
doc, err := goquery.NewDocumentFromResponse(resp)
check(err)
e_hrefs := make(map[string]string)
doc.Find("td div a").Each(func(_ int, s *goquery.Selection) {
e_href, _ := s.Attr("href")
if strings.HasPrefix(e_href, "/Thong-tin-doanh-nghiep") && s.Text() != "" {
e_hrefs[e_href] = s.Text()
}
})
return e_hrefs
}
func fetch(url string, name string, file *os.File, wg *sync.WaitGroup, c chan Enterprise) {
defer wg.Done()
log.Println("Fetching URL", url)
resp, err := http.Get(url)
check(err)
doc, err := goquery.NewDocumentFromResponse(resp)
check(err)
e := new(Enterprise)
doc.Find("td").Each(func(_ int, s *goquery.Selection) {
if s.Text() == "Mã số thuế:" {
e.tax_code = s.Next().Text()
}
if s.Text() == "Tên ngành cấp 2:" {
e.group = s.Next().Text()
}
if s.Text() == "Sở hữu vốn:" {
e.capital = s.Next().Text()
}
})
w := csv.NewWriter(file)
w.Write([]string{name, "'" + e.tax_code, e.group, e.capital})
w.Flush()
c <- *e
}
func getDoc(u, f string) {
parsedUrl, err := url.Parse(u)
check(err)
file, err := os.Create(f)
check(err)
defer file.Close()
var wg sync.WaitGroup
c := make(chan Enterprise)
e_hrefs := findHrefs(u)
for e_href, name := range e_hrefs {
wg.Add(1)
go fetch(parsedUrl.Scheme+"://"+parsedUrl.Host+e_href, name, file, &wg, c)
}
wg.Wait()
}
func main() {
flag.Parse()
if u == "" || f == "" {
fmt.Println("-u=<URL to download from> -f=<Path to the CSV file>")
os.Exit(1)
}
getDoc(u, f)
}
The problem is channel was not closed after all goroutines are finished and I have to press control + C to get my shell prompt back: 问题是毕竟是够程完成,我必须按Ctrl + C来到达我的shell提示符后通道没有关闭:
2016/03/02 09:34:05 Fetching URL ...
2016/03/02 09:34:05 Fetching URL ...
2016/03/02 09:34:05 Fetching URL ...
^Csignal: interrupt
By reading this , I change the last line in getDoc
func to something like: 通过阅读本文 ,我将getDoc
函数的最后一行更改为:
go func() {
wg.Wait()
close(c)
}()
Now I can get my shell prompt back when running but the channel was closed before all goroutines are finished and nothing write to CSV file. 现在,我可以在运行时返回shell提示,但是在所有goroutine完成之前关闭了通道,并且没有任何内容写入CSV文件。
Where did I go wrong? 我哪里做错了?
To me it doesn't look like you're reading from your channel, and because it is a synchronous channel (you never declared a length on it) it will block if it receives a value. 对我来说,它看起来不像是您从通道中读取的内容,并且由于它是一个同步通道(您从未在其上声明长度),因此如果它接收到值,它将阻塞。 So you need to be reading from your c
by value <- c
or your fetch function will just hang at c <- *e
因此,您需要通过value <- c
从c
进行读取,否则您的提取函数将挂在c <- *e
This is causing your sync.WaitGroup
to never wg.Done()
which never decrements the counter, which never causes the wg.Wait()
to stop blocking, which causes your close(c)
to never get called 这导致您的sync.WaitGroup
永远不会wg.Done()
永远不会减少计数器,也永远不会导致wg.Wait()
停止阻止,从而导致您的close(c)
永远不会被调用
My original code is something like this: 我的原始代码是这样的:
e_hrefs := findHrefs(u)
w := csv.NewWriter(file)
for e_href, name := range e_hrefs {
wg.Add(1)
go fetch(parsedUrl.Scheme+"://"+parsedUrl.Host+e_href, name, &wg, c)
e := <-c
w.Write([]string{name, "'" + e.tax_code, e.group, e.capital})
w.Flush()
}
wg.Wait()
and you can see, it's not concurrency. 您会看到,它不是并发的。
I've just fixed by using the range clause to iterate over channel: 我已经通过使用range子句遍历通道进行了修复:
e_hrefs := findHrefs(u)
for e_href, name := range e_hrefs {
wg.Add(1)
go fetch(parsedUrl.Scheme+"://"+parsedUrl.Host+e_href, name, &wg, c)
}
go func() {
wg.Wait()
close(c)
}()
w := csv.NewWriter(file)
for e := range c {
w.Write([]string{e.name, "'" + e.tax_code, e.group, e.capital})
w.Flush()
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.