I am a newbie to py and cannot figure out how to find the number of sales calls 20 days after the FIRST sale. The question is asking me to figure out the percent of sales people who made at least 10 sales calls in their first 20 days. Each row is a sales call and the salespeople are represented by the col id
, the sales call time in recorded in call_starttime
.
The df is fairly simple and looks like this
id call_starttime level
0 66547 7/28/2015 23:18 1
1 66272 8/10/2015 20:48 0
2 66547 8/20/2015 17:32 2
3 66272 8/31/2015 18:21 0
4 66272 8/31/2015 20:25 0
I already have counted the number of convos per id and can have filtered out anyone who has not made at least 10 salescall
The code is am currently using is
df_withcount=df.groupby(['cc_user_id','cc_cohort']).size().reset_index(name='count')
df_20andmore=df_withcount.loc[(df_withcount['count'] >= 20)]
I expect the output to give me the number of ids (sales people) who in their first 20 days made at least 10 calls. As of now I can only figure out how to do made at least 10 calls over all time
I assume that call_starttime column is of DateTime type.
Let's start from a simplified solution, checking only the second call (not 10 subsequent calls).
I changed slightly your test data, so that person with id = 66272 has the second call within 20 days after the first (August 10 and 19):
id call_starttime level
0 66547 2015-07-28 23:18:00 1
1 66272 2015-08-10 20:48:00 0
2 66547 2015-08-20 17:32:00 2
3 66272 2015-08-19 18:21:00 0
4 66272 2015-08-31 20:25:00 0
The first step is to define a function stating whether the current person is "active" (he did the second call in 20 days from the first):
def active(grp):
if grp.shape[0] < 2:
return False # Single call
d0 = grp.call_starttime.iloc[0]
d1 = grp.call_starttime.iloc[1]
return (d1 - d0).days < 20
This function will be applied to each group of rows (for each person).
To get detailed information on activity of each person, you can run:
df.groupby('id').apply(active)
For my sample data the result is:
id
66272 True
66547 False
dtype: bool
But if you are interested only in the number of active people, use np.count_nonzero
on the above result:
np.count_nonzero(df.groupby('id').apply(active))
For my sample data the result is 1 .
If you want the percentage of active people, divide this number by df.id.unique().size (multipied by 100, to express the result in percents).
And now, how to change this solution to check whether a person has made at least 10 calls in initial 20 days:
The only difference is that active function should compare dates of calls No 0 and 9 .
So this function should be changed to:
def active(grp):
if grp.shape[0] < 10:
return False # Too little calls
d0 = grp.call_starttime.iloc[0]
d1 = grp.call_starttime.iloc[9]
return (d1 - d0).days < 20
I assume that source rows are ordered by call_starttime . If this is not the case, call sort_values(by='call_starttime') before.
I came up with another solution including grouping by level column, with no requirements on source data sort and with easy parametrization concerning numbers of initial days and calls in this period.
Test DataFrame:
id call_starttime level
0 66547 2015-07-28 23:18:00 1
1 66272 2015-08-10 19:48:00 0
2 66547 2015-08-20 17:32:00 1
3 66272 2015-08-19 18:21:00 0
4 66272 2015-08-29 20:25:00 0
5 66777 2015-08-30 20:00:00 0
Level 0 contains one person with 3 calls within first 20 days (August 10, 19 and 29). Note however that the last call has later hour than the first, so actually these 2 TimeStamps are more than 19 days apart, but since my solution clears the time component, this last call will be accounted for.
Start from defining a function:
def activity(grp, dayNo):
stDates = grp.dt.floor('d') # Delete time component
# Leave dates from starting "dayNo" days
stDates = stDates[stDates < stDates.min() + pd.offsets.Day(dayNo)]
return stDates.size
giving the number of calls by particular person (group of call_starttime values) within first dayNo days.
The next function to define is:
def percentage(s, callNo):
return s[s >= callNo].size * 100 / s.size
counting the percentage of values in s (a Series for the current level ) which are >= callNo .
The first processing step is to compute a Series - number of calls, within the defined "starting period", for each level / id :
calls = df.groupby(['level', 'id']).call_starttime.apply(activity, dayNo=20)
The result (for my data) is:
level id
0 66272 3
66777 1
1 66547 1
Name: call_starttime, dtype: int64
To get the final result (percentages for each level , assuming the requirement to make 3 calls), run:
calls.groupby(level=0).apply(percentage, callNo=3)
Note that level=0 above is a reference to the MultiIndex level , not to the column name.
The result (again for my data) is:
level
0 50.0
1 0.0
Name: call_starttime, dtype: float64
Level 0 has one person meeting the criterion (of total 2 people at this level) so the percentage is 50 and at level 1 nobody meets the criterion, so the percentage is 0 .
Note that dayNo and callNo parameters allow easy parametrization concerning the length of the "initial period" for each person and the number of calls to be made in this period.
The computation desrcibed above is for 3 calls, but in your case change callNo to your value, ie 10 .
As you can see this solution is quite short (only 8 lines of code), much shorter and much more "Pandasonic" than the other solution.
If you prefer a "terse" coding style, you can also do the whole computation in a single (although significantly chained) instruction:
df.groupby(['level', 'id']).call_starttime\
.apply(activity, dayNo=20).rename('Percentage')\
.groupby(level=0).apply(percentage, callNo=3)
I added .rename('Percentage') to change the name of the result Series .
I used a Person Class to help solve this problem.
I have tested my code and it works good. There can be improvements but my main focus is achieving a good working solution. Let me know if you have any questions.
import pandas as pd
from datetime import timedelta
import datetime
import numpy as np
# prep data for dataframe
lst = {'call_start_time':['7/28/2015','8/10/2015','7/28/2015','7/28/2015'],
'level':['1','0','1','1'],
'id':['66547', '66272', '66547','66547']}
# create dataframe
df = pd.DataFrame(lst)
# convert to TimeDelta object to subtract days
for index, row in df.iterrows():
row['call_start_time'] = datetime.datetime.strptime(row['call_start_time'], "%m/%d/%Y").date()
# get the end date by adding 20 days to start day
df["end_of_20_days"] = df["call_start_time"] + timedelta(days=20)
# used below comment for testing might need it later
# df['Difference'] = (df['end_of_20_days'] - df['call_start_time']).dt.days
# created person class to keep track of days_count and id
class Person(object):
def __init__(self, id, start_date, end_date):
self.id = id
self.start_date = start_date
self.end_date = end_date
self.days_count = 1
# create list to hold objects of person class
person_list = []
# populate person_list with Person objects and their attributes
for index, row in df.iterrows():
# get result_id to use as conditional for populating Person objects
result_id = any(x.id == row['id'] for x in person_list)
# initialize Person objects and inject with data from dataframe
if len(person_list) == 0:
person_list.append(Person(row['id'], row['call_start_time'], row['end_of_20_days']))
elif not(result_id):
person_list.append(Person(row['id'], row['call_start_time'], row['end_of_20_days']))
else:
for x in person_list:
# if call_start_time is within 20 days time frame, increment day_count to Person object
diff = (x.end_date - row['call_start_time']).days
if x.id == row['id'] and diff <= 20 :
x.days_count += 1
break
# flag to check if nobody hit the sales mark
flag = 0
# print out only person_list ids who have hit the sales mark
for person in person_list:
if person.days_count >= 10:
flag = 1
print("person id:{} has made {} calls within the past 20 days since first call date".format(person.id, person.days_count))
if flag == 0:
print("No one has hit the sales mark")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.