简体   繁体   中英

Django multiple queries optimization

I have a Django view that return a map of the edit distance between thousands of strings. These strings are arguments of the MyModel class. I calculate the distance in the myView function.

I profiled this code and realized that the queryset inside the loop consumes a lot of time.

How could I optimize this?

# models.py
class MyModel(models.Model):
    str1 = models.CharField(max_length=300)
    str2 = models.CharField(max_length=300)

# views.py
def compare(a, b):
    return Levenshtein.distance(a, b) / max(len(a), len(b))

def myView(request):    
    query_set = MyModel.objects.filter(....)
    size = query_set.count()

    arr = numpy.zeros(size ** 2).reshape(size, size)

    for i in range(size):
        m1 = query_set[i].str1
        for j in range(size):
            m2 = query_set[j].str1
            arr[i][j] = compare(m1, m2)

    json_out = json.dumps({'data': arr.tolist()})
    return HttpResponse(json_out, content_type="application/json")

EDIT

I think the problem is related to the database access because I tried a similar approach, but using an external txt file to store the data and it was much faster:

# file.txt
[{'par1': ....}, {'par1': ....}, ...]

# views.py
def myView(request):
    with open('file.txt', 'r') as out:
        data = out.read()
    size = len(data)

    arr = numpy.zeros(size ** 2).reshape(size, size)

    for i in range(size):
        for j in range(size):
            m1 = data[i]['par1']
            m2 = data[j]['par1']
            arr[i][j] = compare(m1, m2)

    json_out = json.dumps({'data': arr.tolist()})
    return HttpResponse(json_out, content_type="application/json")

How many queries is myView actually doing? It should be doing 1 - or possibly 2 for count() and then the actual data. But I would start by verifying that. I use https://github.com/dobarkod/django-queryinspect but most folks use https://github.com/jazzband/django-debug-toolbar to find out how many queries are being done.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM