[英]How to count the daily number of cases within a month by using Pandas' DataFrame?
[英]Python function for Count of date of month number within a dataframe
编辑以添加有效负载示例并完成脚本再次编辑以修改脚本并更好地格式化我的问题
我正在创建一个脚本来分析来自多笔付款的银行对账单的付款周期。 我正在计算一周中最频繁的一天和一个月中的日期,并选择最高的一天作为一周中的任何一天以及它的 position 和给定月份或特定日期内的付款频率。
如果它是一个特定的日期,我希望第二和第三高的日期位于周末或公共假期的最高使用日期的任一侧。
我创建了一个 function 来计算一周中的天数,而无需对 dataframe 进行排序,并允许我将该值作为列添加到 Z6A8064B5DF479455500553C47C50 列表中,而不是以列表结尾。
我需要什么帮助?
这对于工作 7 天并根据过滤器进行计数来说很好,但是,在当月的日期之前,我的 function 有 31 个 if 语句。 我怎样才能使它更简洁并获得相同的结果?
另外,我遇到的问题是: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using.loc[row_indexer,col_indexer] = value instead
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using.loc[row_indexer,col_indexer] = value instead
我根本无法到达 go 。 Copy vs View,我真的不介意它是否使用副本或视图,我只是想以一种或另一种方式摆脱变暖。
脚本和示例数据
以下是我需要使脚本更整洁的部分:
# function to filter by date of month to perform a count of each
import pandas as pd
from datetime import datetime
df = pd.read_csv(r"C:\Users\mattl\OneDrive\Desktop\netflix - only.csv")
# Convert to a Date format here
df['new_date']=df['date'].apply(lambda x: datetime.strptime(x, '%d/%m/%Y'))
# Extend data frame with month, day of month and week day
df['month'] = df['new_date'].apply(lambda x: x.month)
df['dom'] = df['new_date'].apply(lambda x: x.day)
df['dow']=df['new_date'].apply(lambda x: x.strftime("%A"))
# function to filter by weekday to perform a count of each
def totalForWeekDay(weekDay):
filter = df.dow.value_counts();
#print(filter);
if weekDay == 'Sunday':
return filter['Sunday'];
if weekDay == 'Monday':
return filter['Monday'];
if weekDay == 'Tuesday':
return filter['Tuesday'];
if weekDay == 'Wednesday':
return filter['Wednesday'];
if weekDay == 'Thursday':
return filter['Thursday'];
if weekDay == 'Friday':
return filter['Friday'];
if weekDay == 'Saturday':
return filter['Saturday'];
# function to filter by date of month to perform a count of each
def totalForMonthDate(monthDay):
filter = df.dom.value_counts();
#print(filter);
if monthDay == 31:
return filter[31];
if monthDay == 30:
return filter[30];
if monthDay == 29:
return filter[29];
if monthDay == 28:
return filter[28];
if monthDay == 27:
return filter[27];
if monthDay == 26:
return filter[26];
if monthDay == 25:
return filter[25];
if monthDay == 24:
return filter[24];
if monthDay == 23:
return filter[23];
if monthDay == 22:
return filter[22];
if monthDay == 21:
return filter[21];
if monthDay == 20:
return filter[20];
if monthDay == 19:
return filter[19];
if monthDay == 18:
return filter[18];
if monthDay == 17:
return filter[17];
if monthDay == 16:
return filter[16];
if monthDay == 15:
return filter[15];
if monthDay == 14:
return filter[14];
if monthDay == 13:
return filter[13];
if monthDay == 12:
return filter[12];
if monthDay == 11:
return filter[11];
if monthDay == 10:
return filter[10];
if monthDay == 9:
return filter[9];
if monthDay == 8:
return filter[8];
if monthDay == 7:
return filter[7];
if monthDay == 6:
return filter[6];
if monthDay == 5:
return filter[5];
if monthDay == 4:
return filter[4];
if monthDay == 3:
return filter[3];
if monthDay == 2:
return filter[2];
if monthDay == 1:
return filter[1];
# Add column which calls the function resulting it total count of week_day
df['dow_total'] = df['dow'].apply(lambda row: totalForWeekDay(row));
# Add formula and column to dataframe which counts the month number
df['dom_total'] = df['dom'].apply(lambda row: totalForMonthDate(row));
# Show results
print(df)
if df["dom_total"].max() >= df["dow_total"].max():
# Determine the top day of month result
top_dom_tot = df.loc[df['dom_total'] == df['dom_total'].max()]
# isolate the top day of month
top_day_of_month = (top_dom_tot['dom'][0])
print('Top day of month is:')
print(top_day_of_month)
# find dates in list where the date of month is NOT the highest number
dfa = df.loc[df['dom'] != top_day_of_month]
# Determine number of days forwards (positive) or back (negative)
dfa['days_diff'] = df['dom'] - top_day_of_month
print('Payments that are not related to the top day per month')
print(dfa)
现在对于示例 csv 有效负载:
type,party,date, debit , credit
payment,Netflix,22/01/2021,-$19.99, $-
payment,Netflix,22/02/2021,-$19.99, $-
payment,Netflix,22/03/2021,-$19.99, $-
payment,Netflix,22/04/2021,-$19.99, $-
payment,Netflix,24/05/2021,-$19.99, $-
payment,Netflix,22/06/2021,-$19.99, $-
payment,Netflix,22/07/2021,-$19.99, $-
payment,Netflix,23/08/2021,-$19.99, $-
payment,Netflix,22/09/2021,-$19.99, $-
payment,Netflix,22/10/2021,-$19.99, $-
谢谢你的建议!
在totalForMonthDate()
中,您可以将这一系列 if 语句替换为两行:
def totalForMonthDate(monthDay): filter = df.dom.value_counts() return filter[monthDay]
当然,您还为 dataframe 中的每一行运行一次 value_counts(),而整个 dataframe 的值相同。 那是低效的。 您可以通过执行 value_counts() 一次并使用 map 来转换值来替换它:
df['dom_total'] = df['dom'].map(df['dom'].value_counts())
这不仅更短(1 行对 4 行),而且速度也更快。
您收到 SettingWithCopyWarning 是因为您使用 .loc 过滤 dataframe,然后修改过滤后的子集。 解决此问题的最简单方法是在对 dataframe 进行子集化时放入副本。
dfa = df.loc[df['dom'].= top_day_of_month].copy()
注意:后面添加新列的代码不会影响原来的 dataframe。
这是完整的源代码:
import pandas as pd
import io
from datetime import datetime
s = """type,party,date, debit , credit
payment,Netflix,22/01/2021,-$19.99, $-
payment,Netflix,22/02/2021,-$19.99, $-
payment,Netflix,22/03/2021,-$19.99, $-
payment,Netflix,22/04/2021,-$19.99, $-
payment,Netflix,24/05/2021,-$19.99, $-
payment,Netflix,22/06/2021,-$19.99, $-
payment,Netflix,22/07/2021,-$19.99, $-
payment,Netflix,23/08/2021,-$19.99, $-
payment,Netflix,22/09/2021,-$19.99, $-
payment,Netflix,22/10/2021,-$19.99, $- """
df = pd.read_csv(io.StringIO(s))
# Convert to a Date format here
df['new_date'] = pd.to_datetime(df['date'], format='%d/%m/%Y')
# Extend data frame with month, day of month and week day
df['month'] = df['new_date'].dt.month
df['dom'] = df['new_date'].dt.day
df['dow'] = df['new_date'].dt.strftime("%A")
# Add column which calls the function resulting it total count of week_day
df['dow_total'] = df['dow'].map(df['dow'].value_counts())
# Add formula and column to dataframe which counts the month number
df['dom_total'] = df['dom'].map(df['dom'].value_counts())
# Show results
print(df)
if df["dom_total"].max() >= df["dow_total"].max():
# Determine the top day of month result
top_dom_tot = df.loc[df['dom_total'] == df['dom_total'].max()]
# isolate the top day of month
top_day_of_month = (top_dom_tot['dom'][0])
print('Top day of month is:')
print(top_day_of_month)
# find dates in list where the date of month is NOT the highest number
dfa = df.loc[df['dom'] != top_day_of_month].copy()
# Determine number of days forwards (positive) or back (negative)
dfa['days_diff'] = df['dom'] - top_day_of_month
print('Payments that are not related to the top day per month')
print(dfa)
无论如何,希望有所帮助。 很酷的项目!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.