[英]Data race fail to understand
Motivation: I have a huge JSON file which I intend to parse and do something with it.动机:我有一个巨大的 JSON 文件,我打算对其进行解析和处理。
Now, I'm certain there would be some library that would be doing this but I thought of doing this myself to better understand the Go's concurrency constructs.现在,我确信会有一些库会这样做,但我想自己这样做是为了更好地理解 Go 的并发结构。
So my objective is to read the file using scanner
and pump the data on to []byte chan
like this:所以我的目标是使用scanner
读取文件并将数据泵入[]byte chan
,如下所示:
// Not the actual code.
for scanner.Scan() {
input <- []byte(scanner.Text())
}
I ask more than 1 go-routine to receive data from the input
chan and unmarshal the JSON and return the result (whether the marshal was a success or not) and also display the progress bar我要求超过 1 个 go-routine 从input
chan 接收数据并解组 JSON 并返回结果(编组是否成功)并显示进度条
// not the actual code.
for {
bytes := <- input
if err := json.Unmarshal(bytes); err != nil {
errorchan <- true
} else {
successchan <- true
}
progress <- size_of_byte(bytes)
}
// now have other go-routine to handle errorchan, successchan and progress thing.
All seem logical on paper but when I manage to assemble the code (given below) I see data race and I tried my best to understand how that data race is happening but could not(as I remove some other data races that we present in the code earlier)在纸上看起来一切都是合乎逻辑的,但是当我设法组装代码(如下所示)时,我看到了数据竞争,我尽力了解数据竞争是如何发生的,但不能(因为我删除了我们在之前的代码)
workers 0xc0000c2000
Completed 0.000000==================
WARNING: DATA RACE
Read at 0x00c0000c2048 by goroutine 8:
mongo_import/race-d.readFile()
/Users/admin/Documents/goProject/src/mongo_import/race-d/main.go:197 +0x6ff
mongo_import/race-d.TestReadJson()
/Users/admin/Documents/goProject/src/mongo_import/race-d/main_test.go:8 +0x47
testing.tRunner()
/usr/local/Cellar/go/1.13.7/libexec/src/testing/testing.go:909 +0x199
Previous write at 0x00c0000c2048 by goroutine 12:
mongo_import/race-d.(*Worker).trackSuccess()
/Users/admin/Documents/goProject/src/mongo_import/race-d/main.go:103 +0xc0
Goroutine 8 (running) created at:
testing.(*T).Run()
/usr/local/Cellar/go/1.13.7/libexec/src/testing/testing.go:960 +0x651
testing.runTests.func1()
/usr/local/Cellar/go/1.13.7/libexec/src/testing/testing.go:1202 +0xa6
testing.tRunner()
/usr/local/Cellar/go/1.13.7/libexec/src/testing/testing.go:909 +0x199
testing.runTests()
/usr/local/Cellar/go/1.13.7/libexec/src/testing/testing.go:1200 +0x521
testing.(*M).Run()
/usr/local/Cellar/go/1.13.7/libexec/src/testing/testing.go:1117 +0x2ff
main.main()
_testmain.go:44 +0x223
Goroutine 12 (running) created at:
mongo_import/race-d.(*Worker).Start()
/Users/admin/Documents/goProject/src/mongo_import/race-d/main.go:72 +0x15f
==================
--- FAIL: TestReadJson (1.18s)
testing.go:853: race detected during execution of test
FAIL
FAIL mongo_import/race-d 1.192s
FAIL
The data race in the testing package is something new to me.测试 package 中的数据竞赛对我来说是新事物。
But I'm unable to comprehend why this is resulting in data race (and it's making no sense to me)但我无法理解为什么这会导致数据竞争(这对我来说毫无意义)
Previous write at 0x00c0000c2048 by goroutine 12: mongo_import/race-d.(*Worker).trackSuccess() /Users/admin/Documents/goProject/src/mongo_import/race-d/main.go:103 +0xc0 Goroutine 12 (running) created at: mongo_import/race-d.(*Worker).Start() /Users/admin/Documents/goProject/src/mongo_import/race-d/main.go:72 +0x15f
Code:代码:
Here how the code looks like这是代码的样子
package main
import (
"bufio"
"encoding/binary"
"encoding/json"
"fmt"
"log"
"os"
"sync"
"time"
)
// thread that does that job of unmarshal
type Thread struct {
w *Worker
}
// Run the individual thread and process the bytes
// read for worter.input chan
func (thread Thread) Run() {
for {
bytes, ok := <-thread.w.input
if !ok {
return
}
var data map[string]interface{}
if err := json.Unmarshal(bytes, &data); err != nil {
thread.w.errorChan <- true
} else {
thread.w.successChan <- true
}
thread.w.progress <- int64(binary.Size(bytes))
// do other thing
// like insert in db etc.
}
}
// worker that
type Worker struct {
errmutex sync.Mutex
succmutex sync.Mutex
progmutex sync.Mutex
wg sync.WaitGroup
done bool
workers int
fileSize int64
completedByte int64
errorCount int
successCount int
input chan []byte
progress chan int64
errorChan chan bool
successChan chan bool
}
// NewWorker
func NewWorker(count int) *Worker {
return &Worker{workers: count}
}
// start the worker
func (w *Worker) Start() {
fmt.Printf("workers %p\n", w)
w.wg.Add(1)
go w.display()
w.wg.Add(1)
go w.trackProgress()
w.wg.Add(1)
go w.trackSuccess()
w.wg.Add(1)
go w.trackError()
w.wg.Add(1)
go w.Spawn()
w.wg.Wait()
}
// add the error count
func (w *Worker) trackError() {
w.wg.Done()
for {
_, ok := <-w.errorChan
if !ok {
return
}
w.errmutex.Lock()
w.errorCount = w.errorCount + 1
w.errmutex.Unlock()
}
}
// add the success count
func (w *Worker) trackSuccess() {
defer w.wg.Done()
for {
_, ok := <-w.successChan
if !ok {
return
}
w.succmutex.Lock()
w.successCount += 1
w.succmutex.Unlock()
}
}
// spawn individual thread to process the bytes
func (w *Worker) Spawn() {
defer w.wg.Done()
defer w.clean()
var wg sync.WaitGroup
for i := 0; i < w.workers; i++ {
wg.Add(1)
go func() {
defer wg.Done()
Thread{w: w}.Run()
}()
}
wg.Wait()
}
// close the other open chan
func (w *Worker) clean() {
close(w.errorChan)
close(w.successChan)
close(w.progress)
}
// close the input chan
func (w *Worker) Done() {
close(w.input)
}
// sum the total byte we have processed
func (w *Worker) trackProgress() {
defer w.wg.Done()
for {
read, ok := <-w.progress
if !ok {
w.done = true
return
}
w.progmutex.Lock()
w.completedByte += read
w.progmutex.Unlock()
}
}
// display the progress bar
func (w *Worker) display() {
defer w.wg.Done()
for !w.done {
w.progmutex.Lock()
percentage := (float64(w.completedByte) / float64(w.fileSize)) * 100
w.progmutex.Unlock()
fmt.Printf("\r Completed %f", percentage)
time.Sleep(5 * time.Second)
}
}
func readFile(path string) map[string]int {
handler, err := os.Open(path)
if err != nil {
log.Fatal(err)
}
defer handler.Close()
worker := &Worker{workers: 2}
worker.input = make(chan []byte, 2)
worker.progress = make(chan int64, 1)
worker.errorChan = make(chan bool, 1)
worker.successChan = make(chan bool, 1)
if fi, err := handler.Stat(); err != nil {
log.Fatal(err)
} else {
worker.fileSize = fi.Size()
}
scanner := bufio.NewScanner(handler)
go worker.Start()
for scanner.Scan() {
worker.input <- []byte(scanner.Text())
}
worker.Done()
if err := scanner.Err(); err != nil {
log.Fatal(err)
return nil
}
return map[string]int{
"error": worker.errorCount,
"success": worker.successCount,
}
}
func main() {
readFile("dump.json")
}
and Test Code和测试代码
package main // main_test.go
import (
"testing"
)
func TestReadJson(t *testing.T) {
data := readFile("dump2.json")
if data == nil {
t.Error("we got a nil data")
}
}
And here the Sample dump2.json
data这里是示例dump2.json
数据
{"name": "tutorialspoint10"}
{"name":"tutorialspoint2", "age": 15}
{"name":"tutorialspoint3", "age": 25}
{"name":"tutorialspoint4", "age": 28}
{"name":"tutorialspoint5", "age": 40}
{"name": "tutorialspoint6"}
{"name":"tutorialspoint8", "age": 7}
{"name":"tutorialspoint4", "age": 55}
{"name":"tutorialspoint1","age":4}
{"name":"tutorialspoint2"}
Lastly, I know the code posted here has to be minimalist but I tried my best to keep the code minimalist(which is extracted from the original project).最后,我知道这里发布的代码必须是极简的,但我尽力保持代码极简(从原始项目中提取)。 I'm not sure how (or capable as of now) to minimize it even further.我不确定如何(或目前能够)进一步减少它。
You need a read lock at line main.go:197您需要在 main.go:197 行有一个读锁
"success": worker.successCount,
As the log says.正如日志所说。 You try to read while another go-routine try to write.您尝试阅读,而另一个 go-routine 尝试写入。 /Users/admin/Documents/goProject/src/mongo_import/race-d/main.go:197
A short explanation:一个简短的解释:
https://dev.to/wagslane/golang-mutexes-what-is-rwmutex-for-57a0 https://dev.to/wagslane/golang-mutexes-what-is-rwmutex-for-57a0
It may be better in this situation to use Atomic.在这种情况下,使用 Atomic 可能会更好。 https://gobyexample.com/atomic-counters https://gobyexample.com/atomic-counters
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.