Some context:
I'm trying to debug a GoGRPC server, a specific API call seems to take a lot of time. This call does a number of reads to Kafka(lets say 10-20) so I expected it to take some time, just not quite as much.
1 API call takes roughly 1-3 seconds to complete, but if i make 40 api calls in a script, it takes almost 30 seconds to complete all of them. but it doesnt complete them 'concurrently' as I expect, which is taking 5 seconds for the first one, and spit out 1 every second or so for later ones.
It takes 29 seconds and responds to all 40 requests at once. Which causes the API caller to timeout when the requests take too long.
I'm trying to profile the CPU to see where I'm spending the time. But I'm new to this and the outputs for the go profiler is not making a lot of sense.
I've generated diagrams with go tool pprof
, but having some trouble interpreting the output.
CPU call graph
Duration
in this box, is that describing the total time the CPU ran for? not including wiat timeblock profile
EDIT:
I what my server does is retrieve some data from kafka streams. I've identified what was being slow, and I've wrote a script that only has the kafka calling functions. here's the script and block profiling diagram.
every consume to kafka takes about 50-100ms, but because most of the time is spent in doing IO, i expect the throughput of the API to be actually quite high, which is not the case. if i make 100 calls, it takes about 3 seconds, if i make 400 it takes about 10s. trying to see how i can speed out the throughput of the API
func main() {
f, _ := os.Create("cpu.prof")
pprof.StartCPUProfile(f)
defer pprof.StopCPUProfile()
block, _ := os.Create("block.prof")
runtime.SetBlockProfileRate(1)
p := pprof.Lookup("block")
defer p.WriteTo(block, 0)
var wg sync.WaitGroup
wg.Add(messageCount)
for i := 0; i < messageCount; i++ {
go func() {
// consume()
withConsumer()
wg.Done()
}()
}
wg.Wait()
}
var servers = []string{"kafka-1", "kafka-2", "kafka-3"}
var count = 0
var partition = int32(0)
func consume() {
index := count
t := time.Now()
count++
fmt.Println("starting consume", index)
defer func() {
fmt.Println("consume ", index, "took", time.Since(t).String())
}()
consumer, err := sarama.NewConsumer(servers, nil)
if err != nil {
panic(err)
}
var max, min int64
max = 1431401
min = 2
defer func() {
consumer.Close()
}()
pc, err := consumer.ConsumePartition("source-topic", partition, rand.Int63n(max-min)+min)
if err != nil {
panic(err)
}
defer func() {
pc.Close()
}()
signals := make(chan os.Signal, 1)
signal.Notify(signals, os.Interrupt)
select {
case msg := <-pc.Messages():
fmt.Println("msg: ", len(msg.Value))
case <-signals:
return
}
}
- theres a box describing the time, type, buildID etc. the Duration in this box, is that describing the total time the CPU ran for? not including wiat time
Correct. https://golang.org/doc/diagnostics#profiling says:
cpu: CPU profile determines where a program spends its time while actively consuming CPU cycles (as opposed to while sleeping or waiting for I/O).
- there are 2 types of edges, solid lines, and dotted lines. what is the difference? and what does the time marked on the edges mean?
https://github.com/google/pprof/issues/493 says "Dotted/dashed lines indicated that that intervening nodes have been removed. Nodes are removed to keep graphs small enough for visualization."
https://gperftools.github.io/gperftools/cpuprofile.html says "Each edge is labelled with the time spent by the callee on behalf of the caller."
- does the direction of the arrows mean calling direction? eg function A calls B, on the graph it will be A -> B?
Yes.
- each vertex have a time at the bottom eg 0.01s(0.93%) out of 0.49s(45.79%), what does this time mean?
The first time is the local time. The second time is the cumulative time. https://gperftools.github.io/gperftools/cpuprofile.html elaborates:
The "local" time is the time spent executing the instructions directly contained in the procedure (and in any other procedures that were inlined into the procedure). The "cumulative" time is the sum of the "local" time and the time spent in any callees. If the cumulative time is the same as the local time, it is not printed.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.