简体   繁体   中英

How to handle multiple goroutines that share the same channel

I've been searching a lot but could not find an answer for my problem yet.

I need to make multiple calls to an external API, but with different parameters concurrently. And then for each call I need to init a struct for each dataset and process the data I receive from the API call. Bear in mind that I read each line of the incoming request and start immediately send it to the channel.

First problem I encounter was not obvious at the beginning due to the large quantity of data I'm receiving, is that each goroutine does not receive all the data that goes through the channel. (Which I learned by the research I've made). So what I need is a way of requeuing/redirect that data to the correct goroutine.

The function that sends the streamed response from a single dataset. (I've cut useless parts of code that are out of context)

func (api *API) RequestData(ctx context.Context, c chan DWeatherResponse, dataset string, wg *sync.WaitGroup) error {
 for {
        line, err := reader.ReadBytes('\n')
        s := string(line)
        if err != nil {
            log.Println("End of %s", dataset)
            return err
        }
    
        data, err := extractDataFromStreamLine(s, dataset)
        if err != nil {
            continue
        }

        c <- *data
    }
}

The function that will process the incoming data

func (s *StrikeStruct) Process(ch, requeue chan dweather.DWeatherResponse) {
    for {
        data, more := <-ch
        if !more {
            break
        }

       // data contains {dataset string, value float64, date time.Time}
      // The s.Parameter needs to match the dataset
        
         // IMPORTANT PART, checks if the received data is part of this struct dataset
          // If not I want to send it to another go routine until it gets to the correct 
          one. There will be a max of 4 datasets but still this could not be the best approach to have 
        if !api.GetDataset(s.Parameter, data.Dataset) {
            requeue <- data
            continue
        }
        // Do stuff with the data from this point
    }
}

Now on my own API endpoint I have the following:

ch := make(chan dweather.DWeatherResponse, 2)
requeue := make(chan dweather.DWeatherResponse)
final := make(chan strike.StrikePerYearResponse)

    var wg sync.WaitGroup


    for _, s := range args.Parameters.Strikes {
        strike := strike.StrikePerYear{
            Parameter:       strike.Parameter(s.Dataset),
            StrikeValue: s.Value,
        }

        // I receive and process the data in here
        go strike.ProcessStrikePerYear(ch, requeue, final, string(s.Dataset))
    }


    go func() {
        for {
            data, _ := <-requeue
            ch <- data
        }
    }()

   // Creates a goroutine for each dataset
    for _, dataset := range api.Params.Dataset {
        wg.Add(1)
        go api.RequestData(ctx, ch, dataset, &wg)
    }

    wg.Wait()
    close(ch)

    //Once the data is all processed it is all appended
    var strikes []strike.StrikePerYearResponse
    for range args.Fetch.Datasets {
        strikes = append(strikes, <-final)
    }

 return strikes

The issue with this code is that as soon as I start receiving data from more than one endpoint the requeue will block and nothing more happens. If I remove that requeue logic data will be lost if it does not land on the correct goroutine.

My two questions are:

  1. Why is the requeue blocking if it has a goroutine always ready to receive?
  2. Should I take a different approach on how I'm processing the incoming data?

this is not a good way to solving your problem. you should change your solution. I suggest an implementation like the below:

import (
"fmt"
"sync"
)

// answer for https://stackoverflow.com/questions/68454226/how-to-handle-multiple-goroutines-that-share-the-same-channel

var (
    finalResult = make(chan string)
)

// IData use for message dispatcher that all struct must implement its method
type IData interface {
    IsThisForMe() bool
    Process(*sync.WaitGroup)
}

//MainData can be your main struct like StrikePerYear
type MainData struct {
    // add any props
    Id   int
    Name string
}

type DataTyp1 struct {
    MainData *MainData
}

func (d DataTyp1) IsThisForMe() bool {
    // you can check your condition here to checking incoming data
    if d.MainData.Id == 2 {
        return true
    }
    return false
}

func (d DataTyp1) Process(wg *sync.WaitGroup) {
    d.MainData.Name = "processed by DataTyp1"
    // send result to final channel, you can change it as you want
    finalResult <- d.MainData.Name
    wg.Done()
}

type DataTyp2 struct {
    MainData *MainData
}

func (d DataTyp2) IsThisForMe() bool {
    // you can check your condition here to checking incoming data
    if d.MainData.Id == 3 {
         return true
    }
    return false
}

func (d DataTyp2) Process(wg *sync.WaitGroup) {
     d.MainData.Name = "processed by DataTyp2"
    // send result to final channel, you can change it as you want
    finalResult <- d.MainData.Name
    wg.Done()
}

//dispatcher will run new go routine for each request.
//you can implement a worker pool to preventing running too many go routines.
func dispatcher(incomingData *MainData, wg *sync.WaitGroup) {
     // based on your requirements you can remove this go routing or not
    go func() {
        var p IData
        p = DataTyp1{incomingData}
        if p.IsThisForMe() {
            go p.Process(wg)
            return
        }
        p = DataTyp2{incomingData}
        if p.IsThisForMe() {
            go p.Process(wg)
            return
        }
    }()
}
func main() {
    dummyDataArray := []MainData{
        MainData{Id: 2, Name: "this data #2"},
        MainData{Id: 3, Name: "this data #3"},
    }
    wg := sync.WaitGroup{}
    for i := range dummyDataArray {
        wg.Add(1)
        dispatcher(&dummyDataArray[i], &wg)
    }
    result := make([]string, 0)
    done := make(chan struct{})
    // data collector
    go func() {
        loop:for {
            select {
            case <-done:
                break loop
            case r := <-finalResult:
                result = append(result, r)
            }
        }
    }()
    wg.Wait()
    done<- struct{}{}
    for _, s := range result {
        fmt.Println(s)
    }
}

Note: this is just for opening your mind for finding a better solution, and for sure this is not a production-ready code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM