I am attempting to solve the following problem:
Given the list of birth dates and death dates, find the year in which the most people were alive.
Here is my code thus far:
b = [1791, 1796, 1691, 1907, 1999, 2001, 1907] # birth dates
d = [1800, 1803, 1692, 1907, 1852, 1980, 2006] # death dates
year_dict = {} # populates dict key as year, val as total living/dead
for birth in b:
year_dict.setdefault(birth,0) # sets default value of key to 0
year_dict[birth] += 1 # will add +1 for each birth and sums duplicates
for death in d:
year_dict.setdefault(death,0) # sets default value of key to 0
year_dict[death] += -1 # will add -1 for each death and sums duplicates
The following code returns:
{1791: 1, 1796: 1, 1691: 1, 1907: 1, 1999: 1, 2001: 1, 1800: -1, 1803: -1, 1692: -1, 1852: -1, 1980: -1, 2006: -1}
Now I am looking for a way to create a running sum to find which year has the most people living, example:
As we can see, the result shows 1796 had the most people alive based on the given data sets. I am having trouble getting the running sum portion which would need to take each key value, and sum it against the previous value. I have tried several different loops and enumeration, but am now stuck. Once I find the best way of resolving this, I will create a function for efficiency.
If there is a more efficient way of doing this taking into account time/space complexity, please let me know. I am trying to learn efficiency with python. I really appreciate your help!!!
Is there a particular data structure you want to house the result in? I got the same result as the imgur link to print to the terminal. It would not be difficult to write it to a dictionary though.
from collections import OrderedDict
b = [1791, 1796, 1691, 1907, 1999, 2001, 1907] # birth dates
d = [1800, 1803, 1692, 1907, 1852, 1980, 2006] # death dates
year_dict = {} # populates dict key as year, val as total living/dead
for birth in b:
year_dict.setdefault(birth,0) # sets default value of key to 0
year_dict[birth] += 1 # will add +1 for each birth and sums duplicates
for death in d:
year_dict.setdefault(death,0) # sets default value of key to 0
year_dict[death] += -1 # will add -1 for each death and sums duplicates
year_dict = OrderedDict(sorted(year_dict.items(), key=lambda t: t[0]))
solution_dict = {}
total = 0
print('year net_living running_sum')
for year in year_dict:
total += year_dict[year]
solution_dict.update({year:{'net_living': year_dict[year],
'running_sum': total}
})
print('{} {:4} {:10}'.format(year, year_dict[year], total))
Outputs:
year net_living running_sum
1691 1 1
1692 -1 0
1791 1 1
1796 1 2
1800 -1 1
1803 -1 0
1852 -1 -1
1907 1 0
1980 -1 -1
1999 1 0
2001 1 1
2006 -1 0
Output of solution_dict
{
1691: {'net_living': 1, 'running_sum': 1},
1692: {'net_living': -1, 'running_sum': 0},
1791: {'net_living': 1, 'running_sum': 1},
1796: {'net_living': 1, 'running_sum': 2},
1800: {'net_living': -1, 'running_sum': 1},
1803: {'net_living': -1, 'running_sum': 0},
1852: {'net_living': -1, 'running_sum': -1},
1907: {'net_living': 1, 'running_sum': 0},
1980: {'net_living': -1, 'running_sum': -1},
1999: {'net_living': 1, 'running_sum': 0},
2001: {'net_living': 1, 'running_sum': 1},
2006: {'net_living': -1, 'running_sum': 0}
}
I would use pandas
, and make use of its DataFrame
object:
Make the dataframe of people's year of birth and year of death::
born = [1791, 1796, 1691, 1907, 1999, 2001, 1907] # birth dates
died = [1800, 1803, 1692, 1907, 1852, 1980, 2006] # death dates
people = pd.DataFrame({'born': born, 'died': died} for born, died in zip(born, died))
Make a dataframe that includes all years between the first listed birth, and the last listed death:
years = pd.DataFrame(index=np.arange(people['born'].min(), people['died'].max() + 1))
Find the total number of people alive for each of those years:
for year in years.index:
num_living = ((year > people['born']) & (year < people['died'])).sum()
years.loc[year, 'total_living'] = num_living
Calling years.tail()
yields the following:
total_living
2002 1.0
2003 1.0
2004 1.0
2005 1.0
2006 0.0
From there, you can simply do an argmax
on the 'total_living'
column.
To be clear, I assumed a logical situation of people dying after they are born, and (therefore) that there are never negative numbers of people alive.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.