[英]Go Iterator reading 1 million rows from Bigquery 10x slower than Java or kotlin?
My intention is to query Biquery and index some fields in Elasticsearch using Go.我的意图是使用 Go 查询 Biquery 并索引 Elasticsearch 中的一些字段。 It will be a one time batch job.这将是一个一次性的批处理作业。 Since the team has knowledge in Java we decided to benchmark both languages.由于团队了解 Java,我们决定对这两种语言进行基准测试。 I have noticed that Go is working slowly using the "iterator way".我注意到 Go 正在使用“迭代器方式”缓慢工作。
Why this difference in time?.为什么会有这种时间差异?
Do I missing some client or query configuration in Go or that is the expected behavior?.我是否缺少 Go 中的某些客户端或查询配置,或者这是预期的行为?
How can I improve this reading time?如何提高阅读时间?
Both Java/kotlin and Go: Java/kotlin 和 Go:
(I have simplified the code) (我已经简化了代码)
Go 1.16.3 Go 1.16.3
...
type Test struct {
TestNo *big.Rat `bigquery:"testNo,nullable"`
TestId bigquery.NullString `bigquery:"testId"`
TestTime bigquery.NullDateTime `bigquery:"testTime"`
FirstName bigquery.NullString `bigquery:"firstName"`
LastName bigquery.NullString `bigquery:"lastName"`
Items []ItemTest `bigquery:"f0_"`
}
type ItemTest struct {
ItemType bigquery.NullString `bigquery:"itemType"`
ItemNo bigquery.NullString `bigquery:"itemNo"`
ProductNo *big.Rat `bigquery:"productNo,nullable"`
Qty *big.Rat `bigquery:"qty,nullable"`
Name bigquery.NullString `bigquery:"name"`
Price *big.Rat `bigquery:"price,nullable"`
}
ctx := context.Background()
client, err := bigquery.NewClient(ctx, projectID)
if err != nil {
// TODO: Handle error.
}
q := client.Query(myQuery)
it, err := q.Read(ctx)
if err != nil {
// TODO: Handle error.
}
for {
start := time.Now().UTC()
var t Test
err := it.Next(&t)
if err == iterator.Done {
break
}
if err != nil {
// TODO: Handle error.
}
end += time.Since(start)
IndexToES(t)
}
fmt.Println(end) //13 minutes.
...
takes 13 minutes to read and map to Go structs.读取 map 到 Go 结构需要13 分钟。
Kotlin Kotlin
...
val start: BigDecimal = Instant.now().toEpochMilli().toBigDecimal().setScale(3)
val bigquery = BigQueryOptions.newBuilder()
.setCredentials(credentials)
.setProjectId(PROJECT_ID)
.build()
.service
val queryConfig = QueryJobConfiguration.newBuilder(query).build()
val tableResult = bigquery.query(queryConfig)
val test = results.iterateAll()
.map { myMapper.mapToTest(it) }
val end: BigDecimal = Instant.now().toEpochMilli().toBigDecimal().setScale(3)
logResults(start, end) // 60000ms = 1minute
fun logResults(start: BigDecimal, end: BigDecimal){
println("query: " + (pitB - pitA).setScale(0) + "ms")
}
//iterate through test and indexing at the same time
...
Takes 1 minute ...需要1分钟...
Neither snippet is complete, so it is unclear if this is apples to apples.这两个片段都不完整,因此尚不清楚这是否是苹果对苹果。 If you're wondering where the time is going in the Go program, consider leveraging pprof .如果您想知道 Go 程序的时间在哪里,请考虑利用pprof 。
The other thing to point out is that if you're reading millions of rows of query output, you're going to want to take a look at the BigQuery Storage API .要指出的另一件事是,如果您正在读取数百万行查询 output,您将需要查看BigQuery Storage API 。 Using this rather than the iterators you're currently testing against can make this faster in both languages.使用它而不是您当前正在测试的迭代器可以使这两种语言的速度更快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.