[英]How to populate row of pandas dataframe based on previous row and column condition?
I have a dataframe:我有一个 dataframe:
ID 2000-01 2000-02 2000-03 2001-01 2001-02 val
1 2847 2861 2875 2890 2904 94717
2 1338 1343 1348 1353 1358 70105
3 3301 3311 3321 3331 3341 60307
4 1425 1422 1419 1416 1413 79888
I want to add a new row to the table that refers to the difference of current year to last year, eg: "2001-01" - "2000-01"我想在表中添加一个新行,表示当前年份与去年的差异,例如:“2001-01” - “2000-01”
Output: Output:
ID 2000-01 2000-02 2000-03 2001-01 2001-02 val
1 2847 2861 2875 2890 2904 94717
2 1338 1343 1348 1353 1358 70105
3 3301 3311 3321 3331 3341 60307
4 1425 1422 1419 1416 1413 79888
5 NaN NaN NaN -9 -9 NaN
How do I select the column name for the previous year without hard coding the column header?如何在不对列 header 进行硬编码的情况下,将前一年的列名称设为 select?
Here is code that will do what you ask.这是将执行您要求的代码。 The "if" condition can be modified so that it can detect better columns that contain years.
可以修改“if”条件,以便它可以检测到包含年份的更好的列。 Currently, it only checks after we split on "-" if the result length equals to "2"
目前,如果结果长度等于“2”,它只会在我们拆分“-”后检查
import pandas as pd
import math
df=pd.DataFrame({"ID" :[ 1,2,3,4],
"2000-01":[2847,1338,3301,1425 ],
"2000-02":[2861,1343,3311,1422 ],
"2000-03":[2875,1348,3321,1419 ],
"2001-01":[2890,1353,3331,1416 ],
"2001-02":[2904,1358,3341,1413 ],
"val" :[94717,70105,60307,79888 ]})
#setting index
df=df.set_index("ID")
#creating a dictionary that will serve so pick what is the previous year
ly_dict={}
#making a list of the columns
mylist=df.columns.copy()
#two lists for internal storage
myempty_list=[]
usable_cols=[]
for item in mylist:
#getting the year
ha=item.split("-")
if (len(ha) == 2 ):
ly=str(int(ha[0])-1)+"-"+ha[1]
myempty_list.append(ly)
usable_cols.append(item)
#filling the last year dictionary
ly_dict[item]=ly
combined_list=list(set(list(mylist)+myempty_list))
df=df.reindex(columns=combined_list)
last_row_id=df.shape[0]+1
df.loc[last_row_id] = [math.nan for item in range(df.shape[1])]
for item in usable_cols:
try:
df.loc[last_row_id,item]=df.loc[last_row_id-1,item]-df.loc[last_row_id-1,ly_dict[item]]
except:
pass
df=df.reindex(columns=mylist)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.