Edited to add payload example and complete script Edited again to modify the script and format my problem better
I am creating a script to analyse payment cycles from a bank statement of multiple payments. I am working out the most frequent day of week and date of month and selecting the highest as either day of week along with its position and frequency of payments within a given month or a specific date.
Where its a specific date, I expect the second and third highest dates to be either side of where the highest used date falls on a weekend or on a public holiday.
I created a function to do a count of the days in a week without needing to sort the dataframe and to allow me to add that value as a column in the dataframe rather than it ending up as a list.
What I need help on?
That was fine for working through 7 days and counting based on filters but, when doing it by the date of the month, my function for it has 31 if statements. How can I make this more concise and gain the same outcome?
Also, I have the problem with: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using.loc[row_indexer,col_indexer] = value instead
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using.loc[row_indexer,col_indexer] = value instead
which I simply can't get to go away. Copy vs View, I really don't mind whether it uses a copy or view, I just want to get rid of the warming one way or the other.
Script and Example Data
Below is the part of the script I need to make tidier:
# function to filter by date of month to perform a count of each
import pandas as pd
from datetime import datetime
df = pd.read_csv(r"C:\Users\mattl\OneDrive\Desktop\netflix - only.csv")
# Convert to a Date format here
df['new_date']=df['date'].apply(lambda x: datetime.strptime(x, '%d/%m/%Y'))
# Extend data frame with month, day of month and week day
df['month'] = df['new_date'].apply(lambda x: x.month)
df['dom'] = df['new_date'].apply(lambda x: x.day)
df['dow']=df['new_date'].apply(lambda x: x.strftime("%A"))
# function to filter by weekday to perform a count of each
def totalForWeekDay(weekDay):
filter = df.dow.value_counts();
#print(filter);
if weekDay == 'Sunday':
return filter['Sunday'];
if weekDay == 'Monday':
return filter['Monday'];
if weekDay == 'Tuesday':
return filter['Tuesday'];
if weekDay == 'Wednesday':
return filter['Wednesday'];
if weekDay == 'Thursday':
return filter['Thursday'];
if weekDay == 'Friday':
return filter['Friday'];
if weekDay == 'Saturday':
return filter['Saturday'];
# function to filter by date of month to perform a count of each
def totalForMonthDate(monthDay):
filter = df.dom.value_counts();
#print(filter);
if monthDay == 31:
return filter[31];
if monthDay == 30:
return filter[30];
if monthDay == 29:
return filter[29];
if monthDay == 28:
return filter[28];
if monthDay == 27:
return filter[27];
if monthDay == 26:
return filter[26];
if monthDay == 25:
return filter[25];
if monthDay == 24:
return filter[24];
if monthDay == 23:
return filter[23];
if monthDay == 22:
return filter[22];
if monthDay == 21:
return filter[21];
if monthDay == 20:
return filter[20];
if monthDay == 19:
return filter[19];
if monthDay == 18:
return filter[18];
if monthDay == 17:
return filter[17];
if monthDay == 16:
return filter[16];
if monthDay == 15:
return filter[15];
if monthDay == 14:
return filter[14];
if monthDay == 13:
return filter[13];
if monthDay == 12:
return filter[12];
if monthDay == 11:
return filter[11];
if monthDay == 10:
return filter[10];
if monthDay == 9:
return filter[9];
if monthDay == 8:
return filter[8];
if monthDay == 7:
return filter[7];
if monthDay == 6:
return filter[6];
if monthDay == 5:
return filter[5];
if monthDay == 4:
return filter[4];
if monthDay == 3:
return filter[3];
if monthDay == 2:
return filter[2];
if monthDay == 1:
return filter[1];
# Add column which calls the function resulting it total count of week_day
df['dow_total'] = df['dow'].apply(lambda row: totalForWeekDay(row));
# Add formula and column to dataframe which counts the month number
df['dom_total'] = df['dom'].apply(lambda row: totalForMonthDate(row));
# Show results
print(df)
if df["dom_total"].max() >= df["dow_total"].max():
# Determine the top day of month result
top_dom_tot = df.loc[df['dom_total'] == df['dom_total'].max()]
# isolate the top day of month
top_day_of_month = (top_dom_tot['dom'][0])
print('Top day of month is:')
print(top_day_of_month)
# find dates in list where the date of month is NOT the highest number
dfa = df.loc[df['dom'] != top_day_of_month]
# Determine number of days forwards (positive) or back (negative)
dfa['days_diff'] = df['dom'] - top_day_of_month
print('Payments that are not related to the top day per month')
print(dfa)
Now for the example csv payload:
type,party,date, debit , credit
payment,Netflix,22/01/2021,-$19.99, $-
payment,Netflix,22/02/2021,-$19.99, $-
payment,Netflix,22/03/2021,-$19.99, $-
payment,Netflix,22/04/2021,-$19.99, $-
payment,Netflix,24/05/2021,-$19.99, $-
payment,Netflix,22/06/2021,-$19.99, $-
payment,Netflix,22/07/2021,-$19.99, $-
payment,Netflix,23/08/2021,-$19.99, $-
payment,Netflix,22/09/2021,-$19.99, $-
payment,Netflix,22/10/2021,-$19.99, $-
Thanks for your advice!
In totalForMonthDate()
, you can replace this series of if statements with two lines:
def totalForMonthDate(monthDay): filter = df.dom.value_counts() return filter[monthDay]
Of course, you're also running value_counts() once for every row in your dataframe, when it's the same for the whole dataframe. That's inefficient. You can replace this by doing value_counts() once and using map to translate the values:
df['dom_total'] = df['dom'].map(df['dom'].value_counts())
Not only is this shorter (1 line vs 4 lines) but it's faster too.
You're getting a SettingWithCopyWarning because you're using.loc to filter down the dataframe, then modifying that filtered subset. The simplest way to fix this is to throw in a copy when you're subsetting the dataframe.
dfa = df.loc[df['dom'].= top_day_of_month].copy()
Note: the code afterward which adds a new column won't affect the original dataframe.
Here is the full source code:
import pandas as pd
import io
from datetime import datetime
s = """type,party,date, debit , credit
payment,Netflix,22/01/2021,-$19.99, $-
payment,Netflix,22/02/2021,-$19.99, $-
payment,Netflix,22/03/2021,-$19.99, $-
payment,Netflix,22/04/2021,-$19.99, $-
payment,Netflix,24/05/2021,-$19.99, $-
payment,Netflix,22/06/2021,-$19.99, $-
payment,Netflix,22/07/2021,-$19.99, $-
payment,Netflix,23/08/2021,-$19.99, $-
payment,Netflix,22/09/2021,-$19.99, $-
payment,Netflix,22/10/2021,-$19.99, $- """
df = pd.read_csv(io.StringIO(s))
# Convert to a Date format here
df['new_date'] = pd.to_datetime(df['date'], format='%d/%m/%Y')
# Extend data frame with month, day of month and week day
df['month'] = df['new_date'].dt.month
df['dom'] = df['new_date'].dt.day
df['dow'] = df['new_date'].dt.strftime("%A")
# Add column which calls the function resulting it total count of week_day
df['dow_total'] = df['dow'].map(df['dow'].value_counts())
# Add formula and column to dataframe which counts the month number
df['dom_total'] = df['dom'].map(df['dom'].value_counts())
# Show results
print(df)
if df["dom_total"].max() >= df["dow_total"].max():
# Determine the top day of month result
top_dom_tot = df.loc[df['dom_total'] == df['dom_total'].max()]
# isolate the top day of month
top_day_of_month = (top_dom_tot['dom'][0])
print('Top day of month is:')
print(top_day_of_month)
# find dates in list where the date of month is NOT the highest number
dfa = df.loc[df['dom'] != top_day_of_month].copy()
# Determine number of days forwards (positive) or back (negative)
dfa['days_diff'] = df['dom'] - top_day_of_month
print('Payments that are not related to the top day per month')
print(dfa)
Anyways, hope that helped. Cool project!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.