I have a long table on firm-level that has the first and last active year and their zip code.
pd.DataFrame({'Firm':['A','B','C'],
'FirstYear':[2020, 2019, 2018],
'LastYear':[2021, 2022, 2019],
'Zipcode':['00000','00001','00003']})
Firm FirstYear LastYear Zipcode
A 2020 2021 00000
B 2019 2022 00001
C 2018 2019 00003
I want to get the panel data that has the zipcode for every active year. So ideally I might want a wide table that impute the value of Zipcode based on first year and last year, and every year between the first and last year .
It should look like this:
2020 2021 2019 2022 2018
A 00000 00000
B 00001 00001 00001 00001
C 00003 00003
I have some code to create a long table per row but I have many millions of rows and it takes a long time. What's the best way in terms of performance and memory use to transform the long table I have to impute every year's zipcode value in pandas?
Thanks in advance.
Responding to the answer's update: Imagine there is a firm whose first and last year didn't overlap with other firms.
df=pd.DataFrame({'Firm':['A','B','C'],
'FirstYear':[2020, 2019, 1997],
'LastYear':[2021, 2022, 2008],
'Zipcode':['00000','00001','00003']})
The output from the code is like:
Firm 2020 2021 2019 2022 1997 2008
A 00000 00000
B 00001 00001 00001 00001
C 00003 00003
Here is a solution with pd.melt()
d = (pd.melt(df,id_vars=['Firm','Zipcode'])
.set_index(['Firm','value'])['Zipcode']
.unstack(level=1))
d = (d.ffill(axis=1)
.where(d.ffill(axis=1).notna() &
d.bfill(axis=1).notna())
.reindex(df[['FirstYear','LastYear']].stack().unique(),axis=1))
Original Answer:
(pd.melt(df,id_vars=['Firm','Zipcode'])
.set_index(['Firm','value'])['Zipcode']
.unstack(level=1)
.reindex(df[['FirstYear','LastYear']].stack().unique(),axis=1))
Output:
value 2020 2021 2019 2022 2018
Firm
A 00000 00000 NaN NaN NaN
B 00001 00001 00001 00001 NaN
C NaN NaN 00003 NaN 00003
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.