[英]Problem loop and computation between different columns on a DataFrame
Here is my problem,这是我的问题,
I have this DataFrame:我有这个 DataFrame:
df=pd.DataFrame({'Date':['2017-05-19','2017-05-22','2017-05-23','2017-05-24','2017-05-25','2017-05-26','2017-05-29'],
'A':[153,152,153,155,153,154,155],
'B':[139,137,141,141.5,141.8,142.1,142.06],})
df['Date']=pd.to_datetime(df['Date'])
df = df.set_index('Date')
My goal was to create another DataFrame named df_end
where I will have a column named "total" corresponding to the total number of column of df
(which is 2) and two column A and B that will take the value "1" if the 1 day rolling mean is > the 2 days rolling mean我的目标是创建另一个名为df_end
的 DataFrame ,其中我将有一个名为“total”的列,对应于df
列的总数(即 2)和两列 A 和 B,如果 1 则取值为“1”日滚动平均值 > 2 天滚动平均值
Here is the code that I wrote:这是我写的代码:
def test(dataframe):
df_init=(dataframe*0).assign(total=0)
df_end=(dataframe*0).assign(total=0)
df_init.iloc[0,df_init.columns.get_loc('total')]=2
df_end.iloc[0,df_end.columns.get_loc('total')]=2
rolling_2D=dataframe.rolling(2).mean().fillna(0)
rolling_1D=dataframe.rolling(1).mean().fillna(0)
calendar=pd.Series(df_init.index)[1:]
for date in calendar:
for cols in dataframe:
prev_date=df_init.index[df_init.index<date][-1]
df_init.loc[date,:]=df_end.loc[prev_date,:] #df_init takes the value of df_end on the previous date
M=2 #number of columns different from the column 'total'
count = (df_init.loc[date,dataframe.columns]!=0).sum() #this will be used to compute the weight
#Weight
if count != M:
weight = 1/(M-count)
else:
weight = 1
#Repartition between different columns
repartition = df_init.loc[date,cols]+df_init.loc[date,'total']*weight
#Conditions
if rolling_1D.loc[date,cols] > rolling_2D.loc[date,cols] and df_init.loc[date,cols]==0:
df_end.loc[date,cols]=repartition
df_end.loc[date,'total']=df_init.loc[date,'total'] - df_end.loc[date,cols]
elif rolling_1D.loc[date,cols] > rolling_2D.loc[date,cols] and df_init.loc[date,cols]!=0:
df_end.loc[date,cols]=df_init.loc[date,cols]
df_end.loc[date,'total']=df_init.loc[date,'total']
else:
df_end.loc[date,cols]=0
df_end.loc[date,'total']=df_init.loc[date,cols]+df_init.loc[date,'total']
return df_end
When I return df_end this is what I have:当我返回 df_end 这就是我所拥有的:
A B total
Date
2017-05-19 0.0 0.0 2.0
2017-05-22 0.0 0.0 2.0
2017-05-23 1.0 1.0 1.0
2017-05-24 1.0 1.0 1.0
2017-05-25 0.0 1.0 1.0
2017-05-26 1.0 1.0 1.0
2017-05-29 1.0 0.0 2.0
As you can see, my program take into account the column B only and I don't know where is my mistake.如您所见,我的程序仅考虑了 B 列,我不知道我的错误在哪里。
The result that I am looking for is:我正在寻找的结果是:
A B total
Date
2017-05-19 0.0 0.0 2.0
2017-05-22 0.0 0.0 2.0
2017-05-23 1.0 1.0 0.0
2017-05-24 1.0 1.0 0.0
2017-05-25 0.0 1.0 1.0
2017-05-26 1.0 1.0 0.0
2017-05-29 1.0 0.0 1.0
I know that I am almost done with this code and I'm stuck with this problem.我知道我几乎完成了这段代码,但我遇到了这个问题。
Do you have any idea where my problem could be?你知道我的问题可能出在哪里吗?
Thank you so much for the help非常感谢你的帮助
Here's a fairly simple way to do that:这是一个相当简单的方法:
new_df = (df.rolling(1).mean() > df.rolling(2).mean()).astype(int)
new_df["total"] = 2 - new_df.sum(axis=1)
print(new_df)
The output is: output 是:
A B total
Date
2017-05-19 0 0 2
2017-05-22 0 0 2
2017-05-23 1 1 0
2017-05-24 1 1 0
2017-05-25 0 1 1
2017-05-26 1 1 0
2017-05-29 1 0 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.