The code gets data from a Postgresql database. From all the data only two fields (Session and Text) are added to Task Struct
.
There are only 2 (each) of the mentioned data in my DB, meaning on doing a len(task)
will return me 2
as the output.
Now from here on is what the problem is:
I make a buffered channel ch
with a length equal to that of the task struct (here in this case,2).
I specify the Max number of workers (threads) allowed, here that is 20.
What the code below does is when I send the task into the channel sends all the elements in the Task struct
(here, 2) and the example code in Task struct will print all of it twice (= length of Task struct). Example is shown at the end.
For example there are 100 data in the channel len(task) = 100
. I want to divide these 100 data into 20 Goroutines which will take care of 5 data each (I don't know if this is possible, please provide some other solution if this is invalid).
So the 100 data will be provided to 20 workers and they would take in 5 data each and run tasks with them and on end the channel would close and thats it.
This will be helpful when the database gets larger and currently too.
Which would be better 20 Workers do tasks each or making number of Workers equal to number of Data in the channel?
var wg sync.WaitGroup
type Task struct {
FetchedSession string
FetchedText string
}
func FetchAllData() {
var task []Task
//Fetch Session from DB
var sess []database.UserSession
database.DB.Find(&sess)
//Fetch CommentText from DB
var cmt []database.CommentReq
database.DB.Find(&cmt)
if len(sess) == len(cmt) {
for i := range sess {
task = append(task, Task{FetchedSession: sess[i].Session, FetchedText: cmt[i].CommentText})
}
}
//making the Task Channel
ch := make(chan []Task, len(task))
MAX_WORKERS := 20
wg.Add(MAX_WORKERS)
for i := 0; i < MAX_WORKERS; i++ {
go func() {
for {
t, ok := <-ch
if !ok {
wg.Done()
return
}
DoTasks(t)
}
}()
}
for i := 0; i < len(task); i++ {
ch <- task
}
close(ch)
wg.Wait()
}
//Since Total number of data in Database is 2 (rows)
//Currently this function takes all data from the channel and runs Twice
func DoTasks(t []Task) {
//Total tasks (data) = 100
//If Max Workers = 20, then this function will run 5 times
//Each Goroutine will get 4 tasks from the channel
// Get the FetchedSession and FetchedTask and do tasks
fmt.Println(t) // This prints all data twice
//Finish one task and continue with the second
}
Example:
Example Data:
Task{FetchedSession: "EncodedString", FetchedText: "Hello"}
Task{FetchedSession: "ExampleString", FetchedText: "Hi"}
//Output
EncodedString
Hello
ExampleString
Hi
EncodedString
Hello
ExampleString
Hi
ch := make(chan Task, len(task))
This means that each value passed on the channel represents a single task.
for i := 0; i < MAX_WORKERS; i++ {
go func() {
defer wg.Done()
for t := range ch {
DoTask(t)
}
}()
}
wg.Done()
will now be run when the worker exits. range ch
will stop after the channel is closed and all tasks are consumed.
func DoTask(t Task) {
About how to choose the number of workers:
Run some benchmarks for your FetchAllData
function, and try changing MAX_WORKERS
(or passing it as a parameter). The optimal value will depend on the task, and the available resources at the time of running the function, meaning the best value on your machine today might not be the best value on anyone else's machine, or your machine tomorrow. A benchmark should help you find a good approximate range to put the value in.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.