[英]Is there a better way to write a recursive `df.loc(t-1)` assignment than to use `.unique()`?
遞歸函數很難向量化,因為時間 t 的每個輸入都取決於時間 t-1 的前一個輸入。
import pandas
df1 = pandas.DataFrame({'year':range(2020,2024),'a':range(3,7)})
# Set the initial value
t0 = min(df1.year)
df1.loc[df1.year==t0, "x"] = 0
當等式的右側是 pandas.core.series.Series 時,此分配不起作用
for t in range (min(df1.year)+1, max(df1.year)+1):
df1.loc[df1.year==t, "x"] = df1.loc[df1.year==t-1,"x"] + df1.loc[df1.year==t-1,"a"]
print(df1)
# year a x
# 0 2020 3 0.0
# 1 2021 4 NaN
# 2 2022 5 NaN
# 3 2023 6 NaN
print(type(df1.loc[df1.year==t-1,"x"] + df1.loc[df1.year==t-1,"a"]))
# <class 'pandas.core.series.Series'>
當方程的右側是一個 numpy 數組時,賦值有效
for t in range (min(df1.year)+1, max(df1.year)+1):
df1.loc[df1.year==t, "x"] = (df1.loc[df1.year==t-1,"x"] + df1.loc[df1.year==t-1,"a"]).unique()
#break
print(df1)
# year a x
# 0 2020 3 0.0
# 1 2021 4 3.0
# 2 2022 5 7.0
# 3 2023 6 12.0
print(type((df1.loc[df1.year==t-1,"x"] + df1.loc[df1.year==t-1,"a"]).unique()))
# <class 'numpy.ndarray'>
當 .loc() 選擇使用年份索引時,分配直接工作
df2 = df.set_index("year").copy()
# Set the initial value
df2.loc[df2.index.min(), "x"] = 0
for t in range (df2.index.min()+1, df2.index.max()+1):
df2.loc[t, "x"] = df2.loc[t-1, "x"] + df2.loc[t-1,"a"]
#break
print(df2)
# a x
# year
# 2020 3 0.0
# 2021 4 3.0
# 2022 5 7.0
# 2023 6 12.0
print(type(df2.loc[t-1, "x"] + df2.loc[t-1,"a"]))
# <class 'numpy.float64'>
type(df1.loc[df1.year==t-1,"x"] + df1.loc[df1.year==t-1,"a"])
是一個熊貓系列,而type(df2.loc[t-1, "x"] + df2.loc[t-1,"a"])
是一個 numpy 浮點數。 為什么這些類型不同?set_index()
。 有沒有比使用.unique()
更好的方法來編寫遞歸.loc()
賦值?也可以看看:
對不起,如果我不明白,你想要這個嗎?
df1['x']= df1['a'].cumsum().shift().fillna(0)
print(df1)
輸出:
year a x
0 2020 3 0.0
1 2021 4 3.0
2 2022 5 7.0
3 2023 6 12.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.