简体   繁体   English

在数组中添加唯一值作为并发 map golang 中的值?

[英]Add unique values in an array as a value in concurrent map golang?

I am iterating over flatProduct.Catalogs slice and populating my productCatalog concurrent map in golang.我正在迭代flatProduct.Catalogs切片并在 golang 中填充我的productCatalog并发map。 I am using upsert method so that I can add only unique productID's into my productCatalog map.我正在使用 upsert 方法,以便我只能将唯一的productID's添加到我的productCatalog中。

This uses a linear scan to check for duplicate product IDs but in my case I have more than 700k productId's so it is very slow for me.这使用线性扫描来检查重复的产品 ID,但在我的例子中,我有超过 700k 的产品 ID,所以它对我来说非常慢。 I am looking for ways to make it more efficient.我正在寻找提高效率的方法。

Below code is called by multiple goroutines in parallel that is why I am using concurrent map here to populate data into it.下面的代码由多个 goroutine 并行调用,这就是为什么我在这里使用并发 map 将数据填充到其中。

var productRows []ClientProduct
err = json.Unmarshal(byteSlice, &productRows)
if err != nil {
    return err
}
for i := range productRows {
    flatProduct, err := r.Convert(spn, productRows[i])
    if err != nil {
        return err
    }
    if flatProduct.StatusCode == definitions.DONE {
        continue
    }
    r.products.Set(strconv.Itoa(flatProduct.ProductId, 10), flatProduct)
    for _, catalogId := range flatProduct.Catalogs {
        catalogValue := strconv.FormatInt(int64(catalogId), 10)
        // how can I improve below Upsert code for `productCatalog` map so that it can runs faster for me?
        r.productCatalog.Upsert(catalogValue, flatProduct.ProductId, func(exists bool, valueInMap interface{}, newValue interface{}) interface{} {
            productID := newValue.(int64)
            if valueInMap == nil {
                return []int64{productID}
            }
            oldIDs := valueInMap.([]int64)

            for _, id := range oldIDs {
                if id == productID {
                    // Already exists, don't add duplicates.
                    return oldIDs
                }
            }
            return append(oldIDs, productID)
        })
    }
}

Above upsert code is very slow for me and it takes lot of time to add unique product id's as a value in my concurrent map. Here is how productCatalog is defined.上面的 upsert 代码对我来说非常慢,并且在我的并发 map 中添加唯一产品 ID 作为值需要花费很多时间。这里是productCatalog的定义方式。

productCatalog *cmap.ConcurrentMap

Here is the upsert method which I am using - https://github.com/orcaman/concurrent-map/blob/master/concurrent_map.go#L56这是我正在使用的upsert方法 - https://github.com/orcaman/concurrent-map/blob/master/concurrent_map.go#L56

This is how I am reading data from this cmap:这就是我从这个 cmap 读取数据的方式:

catalogProductMap := clientRepo.GetProductCatalogMap()
productIds, ok := catalogProductMap.Get("200")
var data = productIds.([]int64)
for _, pid := range data {
  ...
}

To summarize answers from comments:总结评论中的答案:

The upsert function is O(n**2) where n is the length of the slice. upsert function 是 O(n**2) ,其中 n 是切片的长度。

The problem as you also mentioned is iterating through whole slice to find duplicate.您还提到的问题是遍历整个切片以查找重复项。 This can be avoided using another map.使用另一个 map 可以避免这种情况。

Example :示例

r.productCatalog.Upsert(catalogValue, flatProduct.ProductId, func(exists bool, valueInMap interface{}, newValue interface{}) interface{} {
    productID := newValue.(int64)
    if valueInMap == nil {
        return map[int64]struct{}{productID: {}}
    }
    oldIDs := valueInMap.(map[int64]struct{})
    
    // value is irrelevant, no need to check if key exists 
    oldIDs[productID] = struct{}{}
    return oldIDs
})

Nested map will add lot of allocation causing lot of memory usage right?嵌套 map 会增加大量分配,导致大量 memory 的使用,对吗?

Nope, using empty struct won't create new allocations or increase memory usage.不,使用空结构不会创建新的分配或增加 memory 的使用。 You can find plenty of articles/questions about empty struct and its usage.您可以找到很多关于空结构及其用法的文章/问题。 (eg What uses a type with empty struct has in Go? ) (例如,在 Go 中,什么使用了具有空结构的类型?

Note: you could use some kind of optimised search for array like binary search used by sort.Search , but it requires sorted array .注意:您可以对数组使用某种优化搜索,例如sort.Search使用的二进制搜索,但它需要排序数组

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM