简体   繁体   English

如何用缺失的组合填充行熊猫

[英]How to fill rows with missing combinations pandas

I have the following pandas dataframe:我有以下熊猫数据框:

import pandas as pd
foo = pd.DataFrame({'id': [1,1,1,2,2,2,3,3,3,3,3], 'time': [2,3,5,1,3,4,1,2,6,7,8],
                    'val':['a','a','a','a','a','a','a','a','a','a','a']})

 id time    val
0   1   2   a
1   1   3   a
2   1   5   a
3   2   1   a
4   2   3   a
5   2   4   a
6   3   1   a
7   3   2   a
8   3   6   a
9   3   7   a
10  3   8   a

I would like for each id , to add a row, for each missing time , where the val would be 'b' .我想为每个id添加一行,为每个缺失的time添加一行,其中val'b' time would start from 1 time将从1开始

The resulting dataframe would look like this生成的数据框看起来像这样

foo = pd.DataFrame({'id': [1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,3,3], 'time': [1,2,3,4,5,1,2,3,4,1,2,3,4,5,6,7,8],
                    'val':['b','a','a','b','a','a','b','a','a','a','a','b','b','b','a','a','a']})


   id   time    val
0   1   1   b
1   1   2   a
2   1   3   a
3   1   4   b
4   1   5   a
5   2   1   a
6   2   2   b
7   2   3   a
8   2   4   a
9   3   1   a
10  3   2   a
11  3   3   b
12  3   4   b
13  3   5   b
14  3   6   a
15  3   7   a
16  3   8   a

Any ideas how I could do that in python ?有什么想法我可以在 python 中做到这一点吗?

This answer does not work, because it does not take into account the groupby id and also the fact that for id == 1 , i am missing the time == 1 这个答案不起作用,因为它没有考虑 groupby id以及id == 1的事实,我错过了time == 1

Set the index of dataframe to time then reindex the time column per id and fill the NaN values in val column with b将数据帧的索引设置为time ,然后为每个id重新reindex time列,并用b填充val列中的NaN

(
    foo
    .set_index('time').groupby('id')
    .apply(lambda g: g.reindex(range(1, g.index.max() + 1))) 
    .drop('id', axis=1).fillna({'val': 'b'}).reset_index()
)

If you want to try something :fancy:, here is another solution:如果您想尝试一些东西:fancy:,这是另一种解决方案:

(
    foo.groupby('id')['time'].max()
      .map(range).explode().add(1).reset_index(name='time')
      .merge(foo, how='left').fillna({'val': 'b'})
)

    id  time val
0    1     1   b
1    1     2   a
2    1     3   a
3    1     4   b
4    1     5   a
5    2     1   a
6    2     2   b
7    2     3   a
8    2     4   a
9    3     1   a
10   3     2   a
11   3     3   b
12   3     4   b
13   3     5   b
14   3     6   a
15   3     7   a
16   3     8   a

One option is with complete from pyjanitor :一种选择是完整pyjanitor

# pip install pyjanitor
import pandas as pd
import janitor

# build a range of numbers for each group, starting from 1
new_time = {'time': lambda df: range(1, df.max() + 1)}

foo.complete(new_time, by = 'id', fill_value = 'b')

    id  time val
0    1     1   b
1    1     2   a
2    1     3   a
3    1     4   b
4    1     5   a
5    2     1   a
6    2     2   b
7    2     3   a
8    2     4   a
9    3     1   a
10   3     2   a
11   3     3   b
12   3     4   b
13   3     5   b
14   3     6   a
15   3     7   a
16   3     8   a

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM