简体   繁体   English

Go 迭代器从 Bigquery 读取 100 万行比 Java 或 kotlin 慢 10 倍?

[英]Go Iterator reading 1 million rows from Bigquery 10x slower than Java or kotlin?

My intention is to query Biquery and index some fields in Elasticsearch using Go.我的意图是使用 Go 查询 Biquery 并索引 Elasticsearch 中的一些字段。 It will be a one time batch job.这将是一个一次性的批处理作业。 Since the team has knowledge in Java we decided to benchmark both languages.由于团队了解 Java,我们决定对这两种语言进行基准测试。 I have noticed that Go is working slowly using the "iterator way".我注意到 Go 正在使用“迭代器方式”缓慢工作。

Why this difference in time?.为什么会有这种时间差异?

Do I missing some client or query configuration in Go or that is the expected behavior?.我是否缺少 Go 中的某些客户端或查询配置,或者这是预期的行为?

How can I improve this reading time?如何提高阅读时间?

Both Java/kotlin and Go: Java/kotlin 和 Go:

  • run in the same exact environment.在相同的环境中运行。
  • Bigquery dataset 200GB Bigquery 数据集 200GB
  • Same "sql" query, joining two tables and only retrieving 12 ish fields.相同的“sql”查询,连接两个表,只检索 12 个 ish 字段。 LIMIT 1 million rows.限制 100 万行。
  • Running example codes from GCP docs for both languages using interactive queries: https://cloud.google.com/bigquery/docs/running-queries .使用交互式查询从两种语言的 GCP 文档运行示例代码: https://cloud.google.com/bigquery/docs/running-queries

(I have simplified the code) (我已经简化了代码)

Go 1.16.3 Go 1.16.3

...

type Test struct {
    TestNo    *big.Rat              `bigquery:"testNo,nullable"`
    TestId    bigquery.NullString   `bigquery:"testId"`
    TestTime  bigquery.NullDateTime `bigquery:"testTime"`
    FirstName bigquery.NullString   `bigquery:"firstName"`
    LastName  bigquery.NullString   `bigquery:"lastName"`
    Items     []ItemTest            `bigquery:"f0_"`
}

type ItemTest struct {
    ItemType  bigquery.NullString `bigquery:"itemType"`
    ItemNo    bigquery.NullString `bigquery:"itemNo"`
    ProductNo *big.Rat            `bigquery:"productNo,nullable"`
    Qty       *big.Rat            `bigquery:"qty,nullable"`
    Name      bigquery.NullString `bigquery:"name"`
    Price     *big.Rat            `bigquery:"price,nullable"`
}


ctx := context.Background()
client, err := bigquery.NewClient(ctx, projectID)
if err != nil {
    // TODO: Handle error.
}


q := client.Query(myQuery)

it, err := q.Read(ctx)
if err != nil {
    // TODO: Handle error.
}


for {
    start := time.Now().UTC()

    var t Test
    err := it.Next(&t)
    if err == iterator.Done {
        break
    }
    if err != nil {
        // TODO: Handle error.
    }

    end += time.Since(start)

    IndexToES(t)
   
}

fmt.Println(end) //13 minutes.

...

takes 13 minutes to read and map to Go structs.读取 map 到 Go 结构需要13 分钟

Kotlin Kotlin

...

val start: BigDecimal = Instant.now().toEpochMilli().toBigDecimal().setScale(3)

val bigquery = BigQueryOptions.newBuilder()
            .setCredentials(credentials)
            .setProjectId(PROJECT_ID)
            .build()
            .service

val queryConfig = QueryJobConfiguration.newBuilder(query).build()

val tableResult = bigquery.query(queryConfig)

val test = results.iterateAll()
            .map { myMapper.mapToTest(it) }

val end: BigDecimal = Instant.now().toEpochMilli().toBigDecimal().setScale(3)


logResults(start, end) // 60000ms = 1minute 

fun logResults(start: BigDecimal, end: BigDecimal){
       println("query: " + (pitB - pitA).setScale(0) + "ms") 
}

//iterate through test and indexing at the same time
...

Takes 1 minute ...需要1分钟...

Neither snippet is complete, so it is unclear if this is apples to apples.这两个片段都不完整,因此尚不清楚这是否是苹果对苹果。 If you're wondering where the time is going in the Go program, consider leveraging pprof .如果您想知道 Go 程序的时间在哪里,请考虑利用pprof

The other thing to point out is that if you're reading millions of rows of query output, you're going to want to take a look at the BigQuery Storage API .要指出的另一件事是,如果您正在读取数百万行查询 output,您将需要查看BigQuery Storage API Using this rather than the iterators you're currently testing against can make this faster in both languages.使用它而不是您当前正在测试的迭代器可以使这两种语言的速度更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Java 8 Stream Matrix Multiplication 10X比For Loop慢? - Java 8 Stream Matrix Multiplication 10X Slower Than For Loop? 64位Java VM运行应用的速度慢10倍 - 64-bit Java VM runs app 10x slower 从Java套接字读取比Python慢 - Reading from JAVA socket slower than Python Java,为什么从MappedByteBuffer读取比从BufferedReader读取慢 - Java, why reading from MappedByteBuffer is slower than reading from BufferedReader 引擎性能问题。 同一站点从appspot访问速度比从我的域访问快10倍 - Appengine performance problem. Same site 10x faster accessing from appspot than from my domain 从数据库中读取 5000 万的大量数据并将其写入固定宽度和 BSON 多线程比串行写入慢 - Reading huge amount of data 50Million from database and writing it to Fixed width and BSON Multithreading slower than serial writing Java在Mac OS X Lion中悬而未决,而Javac的运行速度比Windows快10倍 - Java is hanging in Mac OS X Lion, and Javac runs more than 10 times slower than Windows 我的应用下载量意外减少了10倍以上 - My app downloads decreased very unexpectedly more than 10x times Spring Intellij 中的启动应用程序:调试时间比运行时间长 10 倍 - Spring Boot app in Intellij: Debug takes 10x longer than Run Java 如何在结果集中检索超过 100 万行 - Java how to retrieve more than 1 million rows in a Resultset
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM