简体   繁体   中英

Filtering unique names of an array with massive amount of objects

I'm currently fetching a lot of objects that contains both names and coordinates of streets. The returned array has around 22.000 objects and the resulting array we want has around 4000, the rest are duplicates. The problem with this kind of data is that the fetched objects can have the same name but different coordinates, and i'm only interested by getting objects based on unique names. If there are more than one object with the same name, i'll only want to keep the first object.

So far i've been trying to loop through the streets by comparing the names. I'd rather use filter or some other more performance efficient solution.

My struct

struct StreetName {
    var name: String
    var polyLine: CLLocationCoordinate2D
}

My code so far

DataManager.shared.getStreetNames { (streets) in  
    var namesArray: [StreetName] = []
    for streetName in streets {
        let name = streetName.name
        if namesArray.count == 0 {
            namesArray.append(streetName)
        } else if namesArray.contains(where: {$0.name == name }) { 
             /* Dont add */ 
        } else {
             namesArray.append(streetName)
        }
    }

    self.streetNames = namesArray.sorted(by: {$0.name < $1.name})
    self.filteredStreetNames = self.streetNames
    OperationQueue.main.addOperation {
        self.streetTableView.reloadData()
    }
}

This code block works, but runs in around 30 seconds on an iPhone X. Which is way too slow. Any ideas?

I think if you profile this, you'll find that the sort is taking the most time. I can't find an official note, but there's a good chance that the underlying implementation is quick sort , which has it's worst complexity when the array is already sorted (or the array is sorted in reverse order).

The average case complexity for quick sort is O(n log n), but in the worst case it's O(n 2 ).

I think you should implement insertion sort instead, or, more accurately, always insert the new elements into an already sorted position. This should reduce your complexity to O(n) for the entire function.

Pseudocode:

  • Fetch street names
  • For each street name
    • find the position in the existing array where the street name would go (I suggest binary search since the array is already sorted)
    • if the street name already exists, skip
    • if the name doesn't exist, insert it.

The result should be a sorted array of unique street names requiring each name to only be read and inserted once.

My take on this:

// Given an array of elements (here just Ints):
let array = (0..<1000).map { _ in Int(arc4random_uniform(100)) }

// Sort it:
let sorted = array.sorted()

// Define an empty result (array of elements) which is a variable 
// and which gets modified in the subsequent reduce function:
var unique: [Int] = []

// A tailored reduce which depends on a sorted array and appends 
// to the result IFF that element is not the last in result:
let result = sorted.reduce(into: unique) { (result, element) in
    if let last = result.last, last == element {
    } else {
        result.append(element)
    }
}

Finally, print the result:

print(array)

Example output on the console: console [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]

@MartinR solved this by using Sets .

My new updated struct

struct StreetName: Hashable {
    static func == (lhs: StreetName, rhs: StreetName) -> Bool {
        return lhs.name == rhs.name
    }

    var hashValue: Int {
        return name.hashValue
    }

    var name: String
    var polyLine: CLLocationCoordinate2D
}

My new updated code

DataManager.shared.getStreetNames { (returnedNamesSet) in
    var namesArray: [StreetName] = Array(returnedNamesSet)

    self.streetNames = namesArray.sorted(by: {$0.name < $1.name})
    self.filteredStreetNames = self.streetNames
    OperationQueue.main.addOperation {
        self.streetTableView.reloadData()
    }
}


Results:

The process time went from 30 seconds to 0.4 seconds by using Set

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM