I want to compute week of the month for a specified date. For computing week of the month, I currently use the user-defined function.
Input data frame:
Output data frame:
Here is what I have tried:
from math import ceil
def week_of_month(dt):
"""
Returns the week of the month for the specified date.
"""
first_day = dt.replace(day=1)
dom = dt.day
adjusted_dom = dom + first_day.weekday()
return int(ceil(adjusted_dom/7.0))
After this,
import pandas as pd
df = pd.read_csv("input_dataframe.csv")
df.date = pd.to_datetime(df.date)
df['year_of_date'] = df.date.dt.year
df['month_of_date'] = df.date.dt.month
df['day_of_date'] = df.date.dt.day
wom = pd.Series()
# worker function for creating week of month series
def convert_date(t):
global wom
wom = wom.append(pd.Series(week_of_month(datetime.datetime(t[0],t[1],t[2]))), ignore_index = True)
# calling worker function for each row of dataframe
_ = df[['year_of_date','month_of_date','day_of_date']].apply(convert_date, axis = 1)
# adding new computed column to dataframe
df['week_of_month'] = wom
# here this updated dataframe should look like Output data frame.
What this does is for each row of data frame it computes week of the month using given function. It makes computations slower as the data frame grows to more rows. Because currently I have more than 10M+ rows.
I am looking for a faster way of doing this. What changes can I make to this code to vectorize this operation across all rows?
Thanks in advance.
Edit: What worked for me after reading answers is below code,
first_day_of_month = pd.to_datetime(df.date.values.astype('datetime64[M]'))
df['week_of_month'] = np.ceil((df.date.dt.day + first_day_of_month.weekday) / 7.0).astype(int)
The week_of_month
method can be vectorized. It could be beneficial to not do the conversion to datetime objects, and instead use pandas only methods.
first_day_of_month = df.date.to_period("M").to_timestamp()
df["week_of_month"] = np.ceil((data.day + first_day_of_month.weekday) / 7.0).astype(int)
just right off the bat without even going into your code and mentioning X/Y problems, etc.:
try to get a list of unique dates, I'm sure in the 10M rows you have more than one is a duplicate.
Steps:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.