简体   繁体   中英

Dividing tasks among Goroutines concurrently

What this code does

The code gets data from a Postgresql database. From all the data only two fields (Session and Text) are added to Task Struct .

There are only 2 (each) of the mentioned data in my DB, meaning on doing a len(task) will return me 2 as the output.

Now from here on is what the problem is:

I make a buffered channel ch with a length equal to that of the task struct (here in this case,2).

I specify the Max number of workers (threads) allowed, here that is 20.

What the code below does is when I send the task into the channel sends all the elements in the Task struct (here, 2) and the example code in Task struct will print all of it twice (= length of Task struct). Example is shown at the end.

What I need for this program to do

For example there are 100 data in the channel len(task) = 100 . I want to divide these 100 data into 20 Goroutines which will take care of 5 data each (I don't know if this is possible, please provide some other solution if this is invalid).

So the 100 data will be provided to 20 workers and they would take in 5 data each and run tasks with them and on end the channel would close and thats it.

This will be helpful when the database gets larger and currently too.

Which would be better 20 Workers do tasks each or making number of Workers equal to number of Data in the channel?

var wg sync.WaitGroup

type Task struct {
    FetchedSession string
    FetchedText    string
}

func FetchAllData() {

    var task []Task

    //Fetch Session from DB
    var sess []database.UserSession
    database.DB.Find(&sess)
    //Fetch CommentText from DB
    var cmt []database.CommentReq
    database.DB.Find(&cmt)

    if len(sess) == len(cmt) {
        for i := range sess {
            task = append(task, Task{FetchedSession: sess[i].Session, FetchedText: cmt[i].CommentText})
        }
    }

    //making the Task Channel
    ch := make(chan []Task, len(task))

    MAX_WORKERS := 20

    wg.Add(MAX_WORKERS)

    for i := 0; i < MAX_WORKERS; i++ {
        go func() {
            for {
                t, ok := <-ch
                if !ok {
                    wg.Done()
                    return
                }
                DoTasks(t)
            }
        }()
    }

    for i := 0; i < len(task); i++ {
        ch <- task
    }

    close(ch)
    wg.Wait()
}

//Since Total number of data in Database is 2 (rows)
//Currently this function takes all data from the channel and runs Twice
func DoTasks(t []Task) {

    //Total tasks (data) = 100
    //If Max Workers = 20, then this function will run 5 times
    //Each Goroutine will get 4 tasks from the channel
    // Get the FetchedSession and FetchedTask and do tasks

    fmt.Println(t) // This prints all data twice

    //Finish one task and continue with the second
}

Example:

Example Data:
Task{FetchedSession: "EncodedString",  FetchedText: "Hello"}
Task{FetchedSession: "ExampleString",  FetchedText: "Hi"}
//Output
EncodedString
Hello
ExampleString
Hi
EncodedString
Hello
ExampleString
Hi
  • Change the task channel type.
ch := make(chan Task, len(task))

This means that each value passed on the channel represents a single task.

  • Simplify your channel iteration
    for i := 0; i < MAX_WORKERS; i++ {
        go func() {
            defer wg.Done()
            for t := range ch {
                DoTask(t)
            }
        }()
    }

wg.Done() will now be run when the worker exits. range ch will stop after the channel is closed and all tasks are consumed.

  • Change "Do" function to match
func DoTask(t Task) {

About how to choose the number of workers:

Run some benchmarks for your FetchAllData function, and try changing MAX_WORKERS (or passing it as a parameter). The optimal value will depend on the task, and the available resources at the time of running the function, meaning the best value on your machine today might not be the best value on anyone else's machine, or your machine tomorrow. A benchmark should help you find a good approximate range to put the value in.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM