[英]How to count the daily number of cases within a month by using Pandas' DataFrame?
[英]Python function for Count of date of month number within a dataframe
編輯以添加有效負載示例並完成腳本再次編輯以修改腳本並更好地格式化我的問題
我正在創建一個腳本來分析來自多筆付款的銀行對賬單的付款周期。 我正在計算一周中最頻繁的一天和一個月中的日期,並選擇最高的一天作為一周中的任何一天以及它的 position 和給定月份或特定日期內的付款頻率。
如果它是一個特定的日期,我希望第二和第三高的日期位於周末或公共假期的最高使用日期的任一側。
我創建了一個 function 來計算一周中的天數,而無需對 dataframe 進行排序,並允許我將該值作為列添加到 Z6A8064B5DF479455500553C47C50 列表中,而不是以列表結尾。
我需要什么幫助?
這對於工作 7 天並根據過濾器進行計數來說很好,但是,在當月的日期之前,我的 function 有 31 個 if 語句。 我怎樣才能使它更簡潔並獲得相同的結果?
另外,我遇到的問題是: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using.loc[row_indexer,col_indexer] = value instead
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using.loc[row_indexer,col_indexer] = value instead
我根本無法到達 go 。 Copy vs View,我真的不介意它是否使用副本或視圖,我只是想以一種或另一種方式擺脫變暖。
腳本和示例數據
以下是我需要使腳本更整潔的部分:
# function to filter by date of month to perform a count of each
import pandas as pd
from datetime import datetime
df = pd.read_csv(r"C:\Users\mattl\OneDrive\Desktop\netflix - only.csv")
# Convert to a Date format here
df['new_date']=df['date'].apply(lambda x: datetime.strptime(x, '%d/%m/%Y'))
# Extend data frame with month, day of month and week day
df['month'] = df['new_date'].apply(lambda x: x.month)
df['dom'] = df['new_date'].apply(lambda x: x.day)
df['dow']=df['new_date'].apply(lambda x: x.strftime("%A"))
# function to filter by weekday to perform a count of each
def totalForWeekDay(weekDay):
filter = df.dow.value_counts();
#print(filter);
if weekDay == 'Sunday':
return filter['Sunday'];
if weekDay == 'Monday':
return filter['Monday'];
if weekDay == 'Tuesday':
return filter['Tuesday'];
if weekDay == 'Wednesday':
return filter['Wednesday'];
if weekDay == 'Thursday':
return filter['Thursday'];
if weekDay == 'Friday':
return filter['Friday'];
if weekDay == 'Saturday':
return filter['Saturday'];
# function to filter by date of month to perform a count of each
def totalForMonthDate(monthDay):
filter = df.dom.value_counts();
#print(filter);
if monthDay == 31:
return filter[31];
if monthDay == 30:
return filter[30];
if monthDay == 29:
return filter[29];
if monthDay == 28:
return filter[28];
if monthDay == 27:
return filter[27];
if monthDay == 26:
return filter[26];
if monthDay == 25:
return filter[25];
if monthDay == 24:
return filter[24];
if monthDay == 23:
return filter[23];
if monthDay == 22:
return filter[22];
if monthDay == 21:
return filter[21];
if monthDay == 20:
return filter[20];
if monthDay == 19:
return filter[19];
if monthDay == 18:
return filter[18];
if monthDay == 17:
return filter[17];
if monthDay == 16:
return filter[16];
if monthDay == 15:
return filter[15];
if monthDay == 14:
return filter[14];
if monthDay == 13:
return filter[13];
if monthDay == 12:
return filter[12];
if monthDay == 11:
return filter[11];
if monthDay == 10:
return filter[10];
if monthDay == 9:
return filter[9];
if monthDay == 8:
return filter[8];
if monthDay == 7:
return filter[7];
if monthDay == 6:
return filter[6];
if monthDay == 5:
return filter[5];
if monthDay == 4:
return filter[4];
if monthDay == 3:
return filter[3];
if monthDay == 2:
return filter[2];
if monthDay == 1:
return filter[1];
# Add column which calls the function resulting it total count of week_day
df['dow_total'] = df['dow'].apply(lambda row: totalForWeekDay(row));
# Add formula and column to dataframe which counts the month number
df['dom_total'] = df['dom'].apply(lambda row: totalForMonthDate(row));
# Show results
print(df)
if df["dom_total"].max() >= df["dow_total"].max():
# Determine the top day of month result
top_dom_tot = df.loc[df['dom_total'] == df['dom_total'].max()]
# isolate the top day of month
top_day_of_month = (top_dom_tot['dom'][0])
print('Top day of month is:')
print(top_day_of_month)
# find dates in list where the date of month is NOT the highest number
dfa = df.loc[df['dom'] != top_day_of_month]
# Determine number of days forwards (positive) or back (negative)
dfa['days_diff'] = df['dom'] - top_day_of_month
print('Payments that are not related to the top day per month')
print(dfa)
現在對於示例 csv 有效負載:
type,party,date, debit , credit
payment,Netflix,22/01/2021,-$19.99, $-
payment,Netflix,22/02/2021,-$19.99, $-
payment,Netflix,22/03/2021,-$19.99, $-
payment,Netflix,22/04/2021,-$19.99, $-
payment,Netflix,24/05/2021,-$19.99, $-
payment,Netflix,22/06/2021,-$19.99, $-
payment,Netflix,22/07/2021,-$19.99, $-
payment,Netflix,23/08/2021,-$19.99, $-
payment,Netflix,22/09/2021,-$19.99, $-
payment,Netflix,22/10/2021,-$19.99, $-
謝謝你的建議!
在totalForMonthDate()
中,您可以將這一系列 if 語句替換為兩行:
def totalForMonthDate(monthDay): filter = df.dom.value_counts() return filter[monthDay]
當然,您還為 dataframe 中的每一行運行一次 value_counts(),而整個 dataframe 的值相同。 那是低效的。 您可以通過執行 value_counts() 一次並使用 map 來轉換值來替換它:
df['dom_total'] = df['dom'].map(df['dom'].value_counts())
這不僅更短(1 行對 4 行),而且速度也更快。
您收到 SettingWithCopyWarning 是因為您使用 .loc 過濾 dataframe,然后修改過濾后的子集。 解決此問題的最簡單方法是在對 dataframe 進行子集化時放入副本。
dfa = df.loc[df['dom'].= top_day_of_month].copy()
注意:后面添加新列的代碼不會影響原來的 dataframe。
這是完整的源代碼:
import pandas as pd
import io
from datetime import datetime
s = """type,party,date, debit , credit
payment,Netflix,22/01/2021,-$19.99, $-
payment,Netflix,22/02/2021,-$19.99, $-
payment,Netflix,22/03/2021,-$19.99, $-
payment,Netflix,22/04/2021,-$19.99, $-
payment,Netflix,24/05/2021,-$19.99, $-
payment,Netflix,22/06/2021,-$19.99, $-
payment,Netflix,22/07/2021,-$19.99, $-
payment,Netflix,23/08/2021,-$19.99, $-
payment,Netflix,22/09/2021,-$19.99, $-
payment,Netflix,22/10/2021,-$19.99, $- """
df = pd.read_csv(io.StringIO(s))
# Convert to a Date format here
df['new_date'] = pd.to_datetime(df['date'], format='%d/%m/%Y')
# Extend data frame with month, day of month and week day
df['month'] = df['new_date'].dt.month
df['dom'] = df['new_date'].dt.day
df['dow'] = df['new_date'].dt.strftime("%A")
# Add column which calls the function resulting it total count of week_day
df['dow_total'] = df['dow'].map(df['dow'].value_counts())
# Add formula and column to dataframe which counts the month number
df['dom_total'] = df['dom'].map(df['dom'].value_counts())
# Show results
print(df)
if df["dom_total"].max() >= df["dow_total"].max():
# Determine the top day of month result
top_dom_tot = df.loc[df['dom_total'] == df['dom_total'].max()]
# isolate the top day of month
top_day_of_month = (top_dom_tot['dom'][0])
print('Top day of month is:')
print(top_day_of_month)
# find dates in list where the date of month is NOT the highest number
dfa = df.loc[df['dom'] != top_day_of_month].copy()
# Determine number of days forwards (positive) or back (negative)
dfa['days_diff'] = df['dom'] - top_day_of_month
print('Payments that are not related to the top day per month')
print(dfa)
無論如何,希望有所幫助。 很酷的項目!
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.