I have the following dataset:
input_df = pd.DataFrame({'Product':['Computer']*5 + ['Television']*7,
'Module':['Display']*5 + ['Power Supply']*7,
'TTF':[1,2,3,4,6,1,2,3,4,5,7,8]})
I would like to insert missing rows (index 4 and 11) in order to get the following dataset:
output_df = pd.DataFrame({'Product':['Computer']*6 + ['Television']*8,
'Module':['Display']*6 + ['Power Supply']*8,
'TTF':[1,2,3,4,5,6,1,2,3,4,5,6,7,8]})
What is the most efficient way to insert those rows (my real dataset is actually pretty big with a lot of different categories).
I have found a related post: How would I insert missing rows into this data set? However in this post, the index range doesn't vary from one product to another (always [1 to 8] unlike in my case where it is [1 to 6] for Computer and then [1 to 8] for Television.
Use DataFrame.groupby
with DataFrame.reindex
per groups:
df = (input_df.set_index('TTF')
.groupby(['Product','Module'], group_keys=False)
.apply(lambda x: x.reindex(range(x.index.min(),
x.index.max() + 1), method='ffill'))
.reset_index()
)
print (df)
TTF Product Module
0 1 Computer Display
1 2 Computer Display
2 3 Computer Display
3 4 Computer Display
4 5 Computer Display
5 6 Computer Display
6 1 Television Power Supply
7 2 Television Power Supply
8 3 Television Power Supply
9 4 Television Power Supply
10 5 Television Power Supply
11 6 Television Power Supply
12 7 Television Power Supply
13 8 Television Power Supply
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.