I have a dataframe in pandas that has columns asset1_date, asset1_price, asset2_date, asset2_price, etc (up to about 500 assets). asset1_date and asset2_date are not necessarily the same. I want to reformat it into a panel with one column called asset and then one column for date and one column for price, ie
pd.DataFrame({'asset':['asset1','asset1','asset2','asset2','asset2'],'date':['09/26/2003','09/29/2003','04/10/2007','04/11/2007','04/12/2007'],'price':[102,103,75,74,76]})
Currently, the data looks like:
pd.DataFrame({'asset1_date':['09/26/2003','09/29/2003',np.nan],'asset1_price':[102,103,np.nan],'asset2_date':['04/10/2007','04/11/2007','04/12/2007'],'asset2_price':[75,74,76]})
Could anyone suggest a pandas method to achieve this? Thanks!
This should do the trick:
df=df.stack().reset_index()
df["asset"]=df["level_1"].str.split("_").str[0]
df["col"]=df["level_1"].str.split("_").str[1]
df=df.set_index(["level_0", "col", "asset"]).unstack("col").reset_index("level_0", drop=True).reset_index("asset", drop=False).drop("level_1", axis=1, level=0)
#please note this following line is a bit of a brute force approach, since I'm assuming you want exactly these columns, alternative you can find in here:
#https://stackoverflow.com/a/47979382/11610186
df.columns=["asset", "date", "price"]
Output:
asset date price
0 asset1 09/26/2003 102
1 asset2 04/10/2007 75
2 asset1 09/29/2003 103
3 asset2 04/11/2007 74
4 asset2 04/12/2007 76
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.