简体   繁体   中英

How to populate row of pandas dataframe based on previous row and column condition?

I have a dataframe:

ID  2000-01 2000-02 2000-03 2001-01 2001-02 val
1   2847    2861    2875    2890    2904    94717
2   1338    1343    1348    1353    1358    70105
3   3301    3311    3321    3331    3341    60307
4   1425    1422    1419    1416    1413    79888

I want to add a new row to the table that refers to the difference of current year to last year, eg: "2001-01" - "2000-01"

Output:

ID  2000-01 2000-02 2000-03 2001-01 2001-02 val
1   2847    2861    2875    2890    2904    94717
2   1338    1343    1348    1353    1358    70105
3   3301    3311    3321    3331    3341    60307
4   1425    1422    1419    1416    1413    79888
5   NaN     NaN     NaN     -9      -9      NaN

How do I select the column name for the previous year without hard coding the column header?

Here is code that will do what you ask. The "if" condition can be modified so that it can detect better columns that contain years. Currently, it only checks after we split on "-" if the result length equals to "2"

import pandas as pd
import math


df=pd.DataFrame({"ID" :[ 1,2,3,4],
                "2000-01":[2847,1338,3301,1425  ],
                "2000-02":[2861,1343,3311,1422  ],
                "2000-03":[2875,1348,3321,1419  ],
                "2001-01":[2890,1353,3331,1416  ],
                "2001-02":[2904,1358,3341,1413  ],
                "val" :[94717,70105,60307,79888 ]})
#setting index
df=df.set_index("ID")

#creating a dictionary that will serve so pick what is the previous year
ly_dict={}

#making a list of the columns 
mylist=df.columns.copy()

#two lists for internal storage
myempty_list=[]
usable_cols=[]
for item in mylist:
    #getting the year
    ha=item.split("-")
    
    if (len(ha) == 2 ):
        ly=str(int(ha[0])-1)+"-"+ha[1]
        myempty_list.append(ly)
        usable_cols.append(item)
        #filling the last year dictionary
        ly_dict[item]=ly
        
combined_list=list(set(list(mylist)+myempty_list))
df=df.reindex(columns=combined_list)


last_row_id=df.shape[0]+1
df.loc[last_row_id] = [math.nan for item in range(df.shape[1])]


for item in usable_cols:
    try:
        df.loc[last_row_id,item]=df.loc[last_row_id-1,item]-df.loc[last_row_id-1,ly_dict[item]]
    except:
        pass

df=df.reindex(columns=mylist)


The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM