时间序列分析 For 循环 Python

Question

I'm trying to automate the process of predicting (1) the total demand of each State and (2) demand of each Customer in each State.我试图自动化预测（1）每个州的总需求和（2）每个州每个客户的需求的过程。 The statistic method applied is Moving Average.应用的统计方法是移动平均。 The predict time is 1 month ahead.The data is imported from an excel sheet having 5 columns: Customer, State, Product, Quantity, Order Date.预测时间提前 1 个月。数据是从 5 列 Excel 表导入的：客户、状态、产品、数量、订单日期。 The excel file can be found via the link https://drive.google.com/file/d/1JlIqWl8bfyJ3Io01Zx088GIAC6rRuCa8/view?usp=sharing可以通过以下链接找到 Excel 文件https://drive.google.com/file/d/1JlIqWl8bfyJ3Io01Zx088GIAC6rRuCa8/view?usp=sharing

One Customer can be associated with different States, for example, Aaron Bergman can buy Chair, Art, Phone from stores in Washington, Texas and Oklahoma.一个客户可以与不同的州相关联，例如，Aaron Bergman 可以从华盛顿、德克萨斯和俄克拉荷马州的商店购买椅子、艺术品、电话。 The other customers has the same purchase behaviour.其他客户有相同的购买行为。 For (1) I tried using For loop, but it did not work. For (1) 我尝试使用 For 循环，但没有奏效。 The error is Order_Date not in index错误是 Order_Date 不在索引中

df = pd.read_excel("Sales_data.xlsx")
State_Name = df.State.unique()
Customer_Name = df.Customer.unique()

for x in State_Name:
   df = df[['Order_Date', 'Quantity']]
   df['Order_Date'].min(), df['Order_Date'].max()
   df.isnull().sum()

   df.Timestamp = pd.to_datetime(df.Order_Date, format= '%D-%M-%Y %H:%m')
   df.index = df.Timestamp
   df = df.resample('MS').sum()

   rolling_mean = df.Quantity.rolling(window=10).mean()

Answer 1

Consider turning for loop lines into a defined method and call it with groupby to return time series.考虑将for循环行转换为定义的方法，并使用groupby调用它以返回时间序列。 Also, heed best practices in pandas :另外，请注意pandas最佳实践：

Avoid referencing columns as attributes with period qualifiers .避免将列作为带有句点限定符的属性引用。 Instead, use bracketing [] .相反，使用括号[] 。
Avoid [] with a list for column subsetting .避免带有列子集列表的[] 。 Instead, use reindex .相反，使用reindex 。

def rollmean_func(df):
   # BETTER COLUMN SUBSET
   df = df.reindex(['Order_Date', 'Quantity'], axis='columns')  

   # BETTER COLUMN ASSIGNMENT
   df['Timestamp'] = pd.to_datetime(df['Order_Date'], format= '%D-%M-%Y %H:%m')  
   df.index = df['Timestamp']

   df = df.resample('MS').sum()
   rolling_mean = df['Quantity'].rolling(window=10).mean()
  
   return rolling_mean

States Level州级

state_rollmeans = df.groupby(['State']).apply(rollmean_func)
state_rollmeans
# State      Timestamp 
# Alabama    2014-04-01     NaN
#            2014-05-01     NaN
#            2014-06-01     NaN
#            2014-07-01     NaN
#            2014-08-01     NaN
# ...
# Wisconsin  2017-09-01    10.6
#            2017-10-01     7.5
#            2017-11-01     9.7
#            2017-12-01    12.3
# Wyoming    2016-11-01     NaN
# Name: Quantity, Length: 2070, dtype: float64

Customers Level客户级别

customer_rollmeans = df.groupby(['Customer_Name']).apply(rollmean_func)
customer_rollmeans
# Customer_Name       Timestamp 
# Aaron Bergman       2014-02-01    NaN
#                     2014-03-01    NaN
#                     2014-04-01    NaN
#                     2014-05-01    NaN
#                     2014-06-01    NaN
# ...
# Zuschuss Donatelli  2017-02-01    1.2
#                     2017-03-01    0.7
#                     2017-04-01    0.7
#                     2017-05-01    0.0
#                     2017-06-01    0.3
# Name: Quantity, Length: 26818, dtype: float64

时间序列分析 For 循环 Python

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-10-17 22:21:48

时间序列分析 For 循环 Python

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-10-17 22:21:48

解决方案1
0 已采纳 2020-10-17 22:21:48