Is there anyway to structure data to python list or tuple that is efficient

Question

Is there any way to structure a data fetched from database, currently I use append but it become slow specially when data fetched is more than 1k

courses = []
for i in departments:
    for course in i.course_set.all():
        course = [course.name]
        info_list = []
        for year in range(1, 5):
            info = [year, (['Regular'], ['Irregular'])]
            info_list.append(info)

            count = 0
            for gender in ['male', 'female']:
                regular = students.filter(year_level=year, status='regular', course=course, gender=gender).count()
                irregular = students.filter(year_level=year, status='irregular', course=course, gender=gender).count()

                info[1][0].append(regular)
                info[1][1].append(irregular)

            regular = students.filter(year_level=year, status='regular', course=course, gender=gender).count()
            irregular = students.filter(year_level=year,status='irregular', course=course, gender=gender).count()

            info[1][0].append(regular)
            info[1][1].append(irregular)
            info[1][2].append(cross)

            total = regular + irregular
            count += total
            info.append(count)

        program.append(info_list)

        male = students.filter(gender='male', course=course).count()
        female = students.filter(gender='female', course=course).count()
        overall = students.filter(gender=None), course=course).count()

        program.append(["", "Total", "", male, female, overall, overall])
        courses.append(program)

I'm not sure if it is the append or the query hits multiple times on the database. After I get it I will use it to output the data to pdf tables

Answer 1

The inefficiency you are experiencing is not coming from the Python structure or procedure you use to process the data, but from the way you are using multiple loops and queries to get the data using django. I may not have a full understanding of the data you are dealing with, but from what I can see you are essentially trying to get a breakdown of total students of each category (gender, status, course etc.) and then get subtotals and overall totals.

The way to do this better would be to use only one or two django queries and get the database to do most of the work for you. Then the results can be processed in memory in python to get subtotals.

For example, let's say you need the breakdown of student count by gender, status, course name and year. You could get this data from the database in one call:

results = list(Student.objects.values('year_level', 'status', 'course__name', 'gender').annotate(total=Count('id')))

results will contain a list of dictionaries, something like this:

result = [
    {'year_level': 1, 'status': 'regular', 'course__name': 'Course A', 'gender': 'female', 'total': 2}, 
    {'year_level': 1, 'status': 'regular', 'course__name': 'Course A', 'gender': 'male', 'total': 2}, 
    {'year_level': 2, 'status': 'regular', 'course__name': 'Course B', 'gender': 'female', 'total': 1}, 
    {'year_level': 2, 'status': 'regular', 'course__name': 'Course B', 'gender': 'male', 'total': 1}, 
    {'year_level': 3, 'status': 'regular', 'course__name': 'Course C', 'gender': 'male', 'total': 1}
]

So for every variation in the categorization you get a total count. Then if you want to combine these into subtotals you can use reduce to quickly get these from the results:

male_students = reduce(
   lambda total, result: total + (result['total'] if result['gender'] == 'male' else 0),
   results, initial=0
)
female_students = reduce(
   lambda total, result: total + (result['total'] if result['gender'] == 'female' else 0),
   results, initial=0
)
all_students = reduce(lambda total, result: total + result['total'], results, 0)

Another example broken down by year:

students_by_year = {1: {}, 2: {}, 3:{}, 4:{}}
for year in range(1, 5):
    students_by_year[year]['male'] = reduce(
        lambda total, result: total + (result['total']
            if result['gender'] == 'male' and result['year_level'] == year else 0
        ), results, initial=0)
    students_by_year[year]['female'] = reduce(
        lambda total, result: total + (result['total']
            if result['gender'] == 'female' and result['year_level'] == year else 0
        ), results, initial=0)

Modifying the "if" condition in the lambda function for one or more specific criteria will get you the subtotals.

What you do and how with the list of dictionaries you got in the result in python will be at this point faster no matter what technique you use to process the results. The source of your inefficiency was the way you were looping over the django query results and making multiple queries to the database one by one.

If your dataset is very very large (maybe several hundred thousand students), then it becomes more efficient to construct django queries individually for the specific subtotals you need, for example:

male_students_by_year = list(Student.objects.filter(gender='male').values('year_level').annotate(total=Count('id')))

The above would give you this result:

[ 
   {'year_level': 1, 'total': 2}, 
   {'year_level': 2, 'total': 1}, 
   {'year_level': 3, 'total': 1}
]

Hope this helps, please let me know if any of the above needs clarification.

Is there anyway to structure data to python list or tuple that is efficient

Question

1 answers

solution1
1 ACCPTED 2020-07-17 18:59:14

Is there anyway to structure data to python list or tuple that is efficient

Question

1 answers

solution1 1 ACCPTED 2020-07-17 18:59:14

solution1
1 ACCPTED 2020-07-17 18:59:14