[英]Time series analysis For loop Python
I'm trying to automate the process of predicting (1) the total demand of each State and (2) demand of each Customer in each State.我试图自动化预测(1)每个州的总需求和(2)每个州每个客户的需求的过程。 The statistic method applied is Moving Average.
应用的统计方法是移动平均。 The predict time is 1 month ahead.The data is imported from an excel sheet having 5 columns: Customer, State, Product, Quantity, Order Date.
预测时间提前 1 个月。数据是从 5 列 Excel 表导入的:客户、状态、产品、数量、订单日期。 The excel file can be found via the link https://drive.google.com/file/d/1JlIqWl8bfyJ3Io01Zx088GIAC6rRuCa8/view?usp=sharing
可以通过以下链接找到 Excel 文件https://drive.google.com/file/d/1JlIqWl8bfyJ3Io01Zx088GIAC6rRuCa8/view?usp=sharing
One Customer can be associated with different States, for example, Aaron Bergman can buy Chair, Art, Phone from stores in Washington, Texas and Oklahoma.一个客户可以与不同的州相关联,例如,Aaron Bergman 可以从华盛顿、德克萨斯和俄克拉荷马州的商店购买椅子、艺术品、电话。 The other customers has the same purchase behaviour.
其他客户有相同的购买行为。 For (1) I tried using For loop, but it did not work.
For (1) 我尝试使用 For 循环,但没有奏效。 The error is Order_Date not in index
错误是 Order_Date 不在索引中
df = pd.read_excel("Sales_data.xlsx")
State_Name = df.State.unique()
Customer_Name = df.Customer.unique()
for x in State_Name:
df = df[['Order_Date', 'Quantity']]
df['Order_Date'].min(), df['Order_Date'].max()
df.isnull().sum()
df.Timestamp = pd.to_datetime(df.Order_Date, format= '%D-%M-%Y %H:%m')
df.index = df.Timestamp
df = df.resample('MS').sum()
rolling_mean = df.Quantity.rolling(window=10).mean()
Consider turning for
loop lines into a defined method and call it with groupby
to return time series.考虑将
for
循环行转换为定义的方法,并使用groupby
调用它以返回时间序列。 Also, heed best practices in pandas
:另外,请注意
pandas
最佳实践:
[]
.[]
。[]
with a list for column subsetting .[]
。 Instead, use reindex
.reindex
。def rollmean_func(df):
# BETTER COLUMN SUBSET
df = df.reindex(['Order_Date', 'Quantity'], axis='columns')
# BETTER COLUMN ASSIGNMENT
df['Timestamp'] = pd.to_datetime(df['Order_Date'], format= '%D-%M-%Y %H:%m')
df.index = df['Timestamp']
df = df.resample('MS').sum()
rolling_mean = df['Quantity'].rolling(window=10).mean()
return rolling_mean
States Level州级
state_rollmeans = df.groupby(['State']).apply(rollmean_func)
state_rollmeans
# State Timestamp
# Alabama 2014-04-01 NaN
# 2014-05-01 NaN
# 2014-06-01 NaN
# 2014-07-01 NaN
# 2014-08-01 NaN
# ...
# Wisconsin 2017-09-01 10.6
# 2017-10-01 7.5
# 2017-11-01 9.7
# 2017-12-01 12.3
# Wyoming 2016-11-01 NaN
# Name: Quantity, Length: 2070, dtype: float64
Customers Level客户级别
customer_rollmeans = df.groupby(['Customer_Name']).apply(rollmean_func)
customer_rollmeans
# Customer_Name Timestamp
# Aaron Bergman 2014-02-01 NaN
# 2014-03-01 NaN
# 2014-04-01 NaN
# 2014-05-01 NaN
# 2014-06-01 NaN
# ...
# Zuschuss Donatelli 2017-02-01 1.2
# 2017-03-01 0.7
# 2017-04-01 0.7
# 2017-05-01 0.0
# 2017-06-01 0.3
# Name: Quantity, Length: 26818, dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.