简体   繁体   中英

Sum Values of Python Dictionary (Time/Space Complexity)

I am attempting to solve the following problem:

Given the list of birth dates and death dates, find the year in which the most people were alive.

Here is my code thus far:

b = [1791, 1796, 1691, 1907, 1999, 2001, 1907] # birth dates
d = [1800, 1803, 1692, 1907, 1852, 1980, 2006] # death dates

year_dict = {} # populates dict key as year, val as total living/dead
for birth in b:
    year_dict.setdefault(birth,0) # sets default value of key to 0 
    year_dict[birth] += 1 # will add +1 for each birth and sums duplicates
for death in d:
    year_dict.setdefault(death,0) # sets default value of key to 0
    year_dict[death] += -1 # will add -1 for each death and sums duplicates

The following code returns:

{1791: 1, 1796: 1, 1691: 1, 1907: 1, 1999: 1, 2001: 1, 1800: -1, 1803: -1, 1692: -1, 1852: -1, 1980: -1, 2006: -1}

Now I am looking for a way to create a running sum to find which year has the most people living, example:

Image of desired result

As we can see, the result shows 1796 had the most people alive based on the given data sets. I am having trouble getting the running sum portion which would need to take each key value, and sum it against the previous value. I have tried several different loops and enumeration, but am now stuck. Once I find the best way of resolving this, I will create a function for efficiency.

If there is a more efficient way of doing this taking into account time/space complexity, please let me know. I am trying to learn efficiency with python. I really appreciate your help!!!

Is there a particular data structure you want to house the result in? I got the same result as the imgur link to print to the terminal. It would not be difficult to write it to a dictionary though.

from collections import OrderedDict

b = [1791, 1796, 1691, 1907, 1999, 2001, 1907] # birth dates
d = [1800, 1803, 1692, 1907, 1852, 1980, 2006] # death dates

year_dict = {} # populates dict key as year, val as total living/dead
for birth in b:
    year_dict.setdefault(birth,0) # sets default value of key to 0 
    year_dict[birth] += 1 # will add +1 for each birth and sums duplicates
for death in d:
    year_dict.setdefault(death,0) # sets default value of key to 0
    year_dict[death] += -1 # will add -1 for each death and sums duplicates

year_dict = OrderedDict(sorted(year_dict.items(), key=lambda t: t[0]))
solution_dict = {}

total = 0
print('year net_living running_sum')
for year in year_dict:
    total += year_dict[year]
    solution_dict.update({year:{'net_living': year_dict[year],
                                'running_sum': total}
                                })
    print('{} {:4} {:10}'.format(year, year_dict[year], total))

Outputs:

year net_living running_sum
1691    1          1
1692   -1          0
1791    1          1
1796    1          2
1800   -1          1
1803   -1          0
1852   -1         -1
1907    1          0
1980   -1         -1
1999    1          0
2001    1          1
2006   -1          0

Output of solution_dict

{
1691: {'net_living':  1, 'running_sum':  1},
1692: {'net_living': -1, 'running_sum':  0},
1791: {'net_living':  1, 'running_sum':  1},
1796: {'net_living':  1, 'running_sum':  2},
1800: {'net_living': -1, 'running_sum':  1},
1803: {'net_living': -1, 'running_sum':  0},
1852: {'net_living': -1, 'running_sum': -1},
1907: {'net_living':  1, 'running_sum':  0},
1980: {'net_living': -1, 'running_sum': -1},
1999: {'net_living':  1, 'running_sum':  0},
2001: {'net_living':  1, 'running_sum':  1},
2006: {'net_living': -1, 'running_sum':  0}
}

I would use pandas , and make use of its DataFrame object:

Make the dataframe of people's year of birth and year of death::

born = [1791, 1796, 1691, 1907, 1999, 2001, 1907] # birth dates
died = [1800, 1803, 1692, 1907, 1852, 1980, 2006] # death dates
people = pd.DataFrame({'born': born, 'died': died} for born, died in zip(born, died))

Make a dataframe that includes all years between the first listed birth, and the last listed death:

years = pd.DataFrame(index=np.arange(people['born'].min(), people['died'].max() + 1))

Find the total number of people alive for each of those years:

for year in years.index:
    num_living = ((year > people['born']) & (year < people['died'])).sum()
    years.loc[year, 'total_living'] = num_living

Calling years.tail() yields the following:

    total_living
2002    1.0
2003    1.0
2004    1.0
2005    1.0
2006    0.0

From there, you can simply do an argmax on the 'total_living' column.

To be clear, I assumed a logical situation of people dying after they are born, and (therefore) that there are never negative numbers of people alive.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM