简体   繁体   中英

How to slice a map[string]int into chunks

My objective is to take a map[string]int containing potentially up to a million entries and chunk it in sizes of up to 500 and POST the map to an external service. I'm newer to golang, so I'm tinkering in the Go Playground for now.

Any tips anyone has on how to improve the efficiency of my code base, please share!

Playground: https://play.golang.org/p/eJ4_Pd9X91c

The CLI output I'm seeing is:

original size 60
chunk bookends 0 20
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,
chunk bookends 20 40
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,
chunk bookends 40 60
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,

The problem here is that while the chunk bookends are being calculated correctly, the x value is starting at 0 each time. I think I should expect it to start at the chunk bookend minimum, which would be 0, 20, 40, etc. How come the range is starting at zero each time?

Source:

package main

import (
    "fmt"
    "math/rand"
    "strconv"
)

func main() {
    items := make(map[string]int)

    // Generate some fake data for our testing, in reality this could be 1m entries
    for i := 0; i < 60; i ++ {
        // int as strings are intentional here
        items[strconv.FormatInt(int64(rand.Int()), 10)] = rand.Int()
    }

    // Create a map of just keys so we can easily chunk based on the numeric keys
    i := 0
    keys := make([]string, len(items))
    for k := range items {
            keys[i] = k
            i++
    }

    fmt.Println("original size", len(keys))
    //batchContents := make(map[string]int)

    // Iterate numbers in the size batch we're looking for
    chunkSize := 20
    for chunkStart := 0; chunkStart < len(keys); chunkStart += chunkSize {
        chunkEnd := chunkStart + chunkSize

        if chunkEnd > len(items) {  
            chunkEnd = len(items)
        }

        // Iterate over the keys
        fmt.Println("chunk bookends", chunkStart, chunkEnd)
        for x := range keys[chunkStart:chunkEnd] {
            fmt.Print(x, ",")

            // Build the batch contents with the contents needed from items
            // @todo is there a more efficient approach?
            //batchContents[keys[i]] = items[keys[i]]
        }
        fmt.Println()

        // @todo POST final batch contents
        //fmt.Println(batchContents)
    }

}

When you process a chunk:

for x := range keys[chunkStart:chunkEnd] {}

You are iterating over a slice, and having one iteration variable, it will be the slice index, not the element from the slice (at the given index). Hence it will always start at 0 . (When you iterate over a map, first iteration variable is the key because there is no index there, and the second is the value associated with that key.)

Instead you want this:

for _, key := range keys[chunkStart:chunkEnd] {}

Also note that it's redundant to first collect the keys in a slice, and then process them. You may do that when iterating over the map once, at first. Just keep a variable counting the iterations to know when you reach the chunk size, which may be implicit if you use data structures that keeps this (eg the size of a keys batch slice).

For example (try it on the Go Playground ):

chunkSize := 20
batchKeys := make([]string, 0, chunkSize)
process := func() {
    fmt.Println("Batch keys:", batchKeys)
    batchKeys = batchKeys[:0]
}

for k := range items {
    batchKeys = append(batchKeys, k)
    if len(batchKeys) == chunkSize {
        process()
    }
}
// Process last, potentially incomplete batch
if len(batchKeys) > 0 {
    process()
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM