简体   繁体   中英

How to fill rows with missing combinations pandas

I have the following pandas dataframe:

import pandas as pd
foo = pd.DataFrame({'id': [1,1,1,2,2,2,3,3,3,3,3], 'time': [2,3,5,1,3,4,1,2,6,7,8],
                    'val':['a','a','a','a','a','a','a','a','a','a','a']})

 id time    val
0   1   2   a
1   1   3   a
2   1   5   a
3   2   1   a
4   2   3   a
5   2   4   a
6   3   1   a
7   3   2   a
8   3   6   a
9   3   7   a
10  3   8   a

I would like for each id , to add a row, for each missing time , where the val would be 'b' . time would start from 1

The resulting dataframe would look like this

foo = pd.DataFrame({'id': [1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,3,3], 'time': [1,2,3,4,5,1,2,3,4,1,2,3,4,5,6,7,8],
                    'val':['b','a','a','b','a','a','b','a','a','a','a','b','b','b','a','a','a']})


   id   time    val
0   1   1   b
1   1   2   a
2   1   3   a
3   1   4   b
4   1   5   a
5   2   1   a
6   2   2   b
7   2   3   a
8   2   4   a
9   3   1   a
10  3   2   a
11  3   3   b
12  3   4   b
13  3   5   b
14  3   6   a
15  3   7   a
16  3   8   a

Any ideas how I could do that in python ?

This answer does not work, because it does not take into account the groupby id and also the fact that for id == 1 , i am missing the time == 1

Set the index of dataframe to time then reindex the time column per id and fill the NaN values in val column with b

(
    foo
    .set_index('time').groupby('id')
    .apply(lambda g: g.reindex(range(1, g.index.max() + 1))) 
    .drop('id', axis=1).fillna({'val': 'b'}).reset_index()
)

If you want to try something :fancy:, here is another solution:

(
    foo.groupby('id')['time'].max()
      .map(range).explode().add(1).reset_index(name='time')
      .merge(foo, how='left').fillna({'val': 'b'})
)

    id  time val
0    1     1   b
1    1     2   a
2    1     3   a
3    1     4   b
4    1     5   a
5    2     1   a
6    2     2   b
7    2     3   a
8    2     4   a
9    3     1   a
10   3     2   a
11   3     3   b
12   3     4   b
13   3     5   b
14   3     6   a
15   3     7   a
16   3     8   a

One option is with complete from pyjanitor :

# pip install pyjanitor
import pandas as pd
import janitor

# build a range of numbers for each group, starting from 1
new_time = {'time': lambda df: range(1, df.max() + 1)}

foo.complete(new_time, by = 'id', fill_value = 'b')

    id  time val
0    1     1   b
1    1     2   a
2    1     3   a
3    1     4   b
4    1     5   a
5    2     1   a
6    2     2   b
7    2     3   a
8    2     4   a
9    3     1   a
10   3     2   a
11   3     3   b
12   3     4   b
13   3     5   b
14   3     6   a
15   3     7   a
16   3     8   a

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM