[英]Pymongo inefficient query
我目前有以下一段代碼:
houses = self.database[self.database_name][constants.DATABASE_HOUSES_COLLECTION]
bulk_houses = houses.initialize_unordered_bulk_op()
for house in houses.find().skip(self.from_index).limit(
constants.MAX_HOUSE_FUNCTION_DOCUMENTS_PER_THREAD):
house_coords = (house.get("longitude"), house.get("latitude"))
min = 10000
for c in self.collection.find({"city": house.get("city")}, {"longtitude": 1, "latitude": 1}):
collection_coords = (c.get("longitude"), c.get("latitude"))
distance = geopy.distance.distance(collection_coords, house_coords).km
if distance < min:
min = distance
if min == 10000:
min = None
bulk_houses.find({"_id": house.get("_id")}).update(
{"$set": {f"demography.distanceClosest{translated.get(self.collection.name)}": min}})
bulk_houses.execute()
它的作用是遍歷房屋集合中的每個房屋。
對於每個房子,它會檢查已經給出的第二個集合,並且只獲取經度和緯度。
它計算同一城市內的最近距離。
這個函數是多線程的,我這樣調用函數:
houses_count = self.houses.count_documents({})
for i in range(len(self.collections)):
x = 0
while x < houses_count:
match_demography_house = MatchDemographyHouse(self.collections[i], self.mongo_db,
constants.DATABASE_NAME, x,
x + constants.MAX_HOUSE_FUNCTION_DOCUMENTS_PER_THREAD)
match_demography_house.add_to_pool(self.match_house_demography_executor)
x += constants.MAX_HOUSE_FUNCTION_DOCUMENTS_PER_THREAD
而且正如您想象的那樣,它非常低效。 在city上加一個索引,速度提升了一點點,而且只抓取了經緯度,速度稍微提高了一點。
遍歷1000多所房子需要1分鍾,它遍歷的集合有240個文檔。 目前每個線程可以處理 50 個房屋。
試試這個測試工具。 在我的機器上,它運行不到一秒鍾,沒有索引:
import pymongo
import random
import datetime
import geopy.distance
db = pymongo.MongoClient()['testhouses']
db.testhouses.delete_many({})
db.testcollection.delete_many({})
for i in range(1000):
longitude = random.randint(-89, 89)
latitude = random.randint(-180, 180)
city = f'City {i}'
db.testhouses.insert_one({'city': city, 'longitude': longitude, 'latitude': latitude})
if i < 240:
db.testcollection.insert_one({'city': city, 'longitude': longitude, 'latitude': latitude})
start_time = datetime.datetime.now()
bulk_houses = db.testhouses.initialize_unordered_bulk_op()
for house in db.testhouses.find():
house_coords = (house.get("longitude"), house.get("latitude"))
minimum = 10000
for c in db.testcollection.find({"city": house.get("city")}, {"longtitude": 1, "latitude": 1}):
collection_coords = (c.get("longitude"), c.get("latitude"))
distance = geopy.distance.distance(collection_coords, house_coords).km
if distance < minimum: minimum = distance
if minimum == 10000: minimum = None
bulk_houses.find({"_id": house.get("_id")}).update({"$set": {f"demography.distanceClosest": minimum}})
result = bulk_houses.execute()
print(f'Bulk updates: {result["nModified"]} updated')
print(f'Time taken: {(datetime.datetime.now() - start_time).microseconds / 1000000} seconds')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.