简体   繁体   English

使用 pandas 创建并填充零行

[英]Create and fill rows of zeros with pandas

I have two dataframes, 'inp' and 'IndiRelat'.我有两个数据框,“inp”和“IndiRelat”。 The 'inp' are the entries of certain data. “inp”是某些数据的条目。

import pandas as pd

inp = [[2, 'cvt'  , -3,  5, 17, -2, -9, -0.2, 'RL'],
       [2, 'cv'   ,  0,  0,  0,  0,  0,    0, 'LL'],
       [2, 'sope' ,  0,  0,  0,  0,  0,    0, 'SD'],
       [2, 'wix+' ,-13,-13,  2,  1,-62, -0.5, 'WI'],
       [2, 'wix-' ,  0, 16,  6, 13,  0,  0.3, 'WI'],
       [4, 'sope' ,-42,  0, 29,  0,  0,  -13, 'SD'],
       [4, 'cv'   ,  0,  0,  0,  0,  0,    0, 'LL'],
       [4, 'cvt'  ,  0,  0,  0,  0,  0,   -1, 'RL'],
       [4, 'wix+' ,-18, -2, 19, 19,  3,  -64, 'WI'],
       [4, 'wix-' ,  0,-30, -2, -2, 32,    0, 'WI']]

inp = pd.DataFrame(data = inp, columns = ['Key','Descr', 'C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'Indicator'])

print(inp['Key'])
print(inp['Indicator'])

IndiRelat = ['SD', 'LL', 'RL', 'SS', 'RR', 'WI', 'WI', 'WI', 'WI', 'QU', 'QU']
IndiRelat = pd.DataFrame(IndiRelat)

I have created an 'inp' DataFrame, in the there is data for each key 'inp ['Key']' which relate to an indicator 'inp['indicator']'.我创建了一个 'inp' DataFrame,其中每个键 'inp ['Key']' 都有与指标 'inp['indicator']' 相关的数据。

The idea is to relate this with a second DataFrame 'IndiRelat, and create rows of zeros in case the 'IndiRelat' data is not in 'inp['indicator']'.这个想法是将此与第二个 DataFrame 'IndiRelat 相关联,并创建零行以防 'IndiRelat' 数据不在 'inp['indicator']' 中。

What I seek to get is something like that.我想要得到的就是这样的东西。

    Index   Descr   C1  C2  C3  C4  C5  C6  Indicator
0   2       cvt     -3  5   17  -2  -9  -0.2    RL
1   2       cv      0   0   0   0   0   0       LL
2   2       sope    0   0   0   0   0   0       SD
3   2       wix+    -13 -13 2   1   -62 -0.5    WI
4   2       wix-    0   16  6   13  0   0.3     WI
5   2       0       0   0   0   0   0   0       WI
6   2       0       0   0   0   0   0   0       WI
7   2       0       0   0   0   0   0   0       QU
8   2       0       0   0   0   0   0   0       QU
9   2       0       0   0   0   0   0   0       SS
10  2       0       0   0   0   0   0   0       RR
11  4       cvt    -42  0   29  0   0   -13     RL
12  4       cv      0   0   0   0   0   0       LL
13  4       sope    0   0   0   0   0   -1      SD
14  4       wix+    -18 -2  19  19  3   -64     WI
15  4       wix-    0   -30 -2  -2  32  0       WI
16  4       0       0   0   0   0   0   0       WI
17  4       0       0   0   0   0   0   0       WI
18  4       0       0   0   0   0   0   0       QU
19  4       0       0   0   0   0   0   0       QU
20  4       0       0   0   0   0   0   0       SS
21  4       0       0   0   0   0   0   0       RR

I would greatly appreciate if you can help me with the idea and suggestions to get it, greetings.如果你能帮助我提出想法和建议,我将不胜感激,问候。

We try to do the cumcount create the unique key then we do reindex我们尝试做cumcount创建唯一键然后我们做reindex

inp['new'] = inp.groupby(['Key','Indicator']).cumcount()
IndiRelat[1] = IndiRelat.groupby(0).cumcount()

IndiRelat.columns = ['Indicator','new']


out = inp.set_index(['Key','Indicator','new']).unstack(level=0).reindex(pd.MultiIndex.from_frame(IndiRelat),fill_value=0).stack().reset_index().sort_values('Key')
out
Out[93]: 
   Indicator  new  Key Descr  C1  C2  C3  C4  C5    C6
0         SD    0    2  sope   0   0   0   0   0   0.0
18        QU    0    2     0   0   0   0   0   0   0.0
16        WI    3    2     0   0   0   0   0   0   0.0
14        WI    2    2     0   0   0   0   0   0   0.0
12        WI    1    2  wix-   0  16   6  13   0   0.3
20        QU    1    2     0   0   0   0   0   0   0.0
8         RR    0    2     0   0   0   0   0   0   0.0
10        WI    0    2  wix+ -13 -13   2   1 -62  -0.5
6         SS    0    2     0   0   0   0   0   0   0.0
4         RL    0    2   cvt  -3   5  17  -2  -9  -0.2
2         LL    0    2    cv   0   0   0   0   0   0.0
9         RR    0    4     0   0   0   0   0   0   0.0
5         RL    0    4   cvt   0   0   0   0   0  -1.0
11        WI    0    4  wix+ -18  -2  19  19   3 -64.0
13        WI    1    4  wix-   0 -30  -2  -2  32   0.0
3         LL    0    4    cv   0   0   0   0   0   0.0
15        WI    2    4     0   0   0   0   0   0   0.0
17        WI    3    4     0   0   0   0   0   0   0.0
1         SD    0    4  sope -42   0  29   0   0 -13.0
19        QU    0    4     0   0   0   0   0   0   0.0
7         SS    0    4     0   0   0   0   0   0   0.0
21        QU    1    4     0   0   0   0   0   0   0.0

An outer merge ought to do the trick.外部合并应该可以解决问题。 As per @BENY's comment and answer, we need to address non-uniqueness of the key:根据@BENY 的评论和回答,我们需要解决密钥的非唯一性问题:

>>> df2 = IndiRelat.rename(columns={0: 'Indicator'})
>>> df2['dedup_indic'] = df2.groupby('Indicator').cumcount()
>>> df = inp.join(inp.groupby('Indicator').cumcount().rename('dedup_indic'))\
...         .merge(df2, how='outer')
>>> df
    Key Descr    C1    C2    C3    C4    C5    C6 Indicator  dedup_indic
0   2.0   cvt  -3.0   5.0  17.0  -2.0  -9.0  -0.2        RL            0
1   2.0    cv   0.0   0.0   0.0   0.0   0.0   0.0        LL            0
2   2.0  sope   0.0   0.0   0.0   0.0   0.0   0.0        SD            0
3   2.0  wix+ -13.0 -13.0   2.0   1.0 -62.0  -0.5        WI            0
4   2.0  wix-   0.0  16.0   6.0  13.0   0.0   0.3        WI            1
5   4.0  sope -42.0   0.0  29.0   0.0   0.0 -13.0        SD            1
6   4.0    cv   0.0   0.0   0.0   0.0   0.0   0.0        LL            1
7   4.0   cvt   0.0   0.0   0.0   0.0   0.0  -1.0        RL            1
8   4.0  wix+ -18.0  -2.0  19.0  19.0   3.0 -64.0        WI            2
9   4.0  wix-   0.0 -30.0  -2.0  -2.0  32.0   0.0        WI            3
10  NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN        SS            0
11  NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN        RR            0
12  NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN        QU            0
13  NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN        QU            1
>>> df.fillna(0).drop(columns=['dedup_indic'])
    Key Descr    C1    C2    C3    C4    C5    C6 Indicator
0   2.0   cvt  -3.0   5.0  17.0  -2.0  -9.0  -0.2        RL
1   2.0    cv   0.0   0.0   0.0   0.0   0.0   0.0        LL
2   2.0  sope   0.0   0.0   0.0   0.0   0.0   0.0        SD
3   2.0  wix+ -13.0 -13.0   2.0   1.0 -62.0  -0.5        WI
4   2.0  wix-   0.0  16.0   6.0  13.0   0.0   0.3        WI
5   4.0  sope -42.0   0.0  29.0   0.0   0.0 -13.0        SD
6   4.0    cv   0.0   0.0   0.0   0.0   0.0   0.0        LL
7   4.0   cvt   0.0   0.0   0.0   0.0   0.0  -1.0        RL
8   4.0  wix+ -18.0  -2.0  19.0  19.0   3.0 -64.0        WI
9   4.0  wix-   0.0 -30.0  -2.0  -2.0  32.0   0.0        WI
10  0.0     0   0.0   0.0   0.0   0.0   0.0   0.0        SS
11  0.0     0   0.0   0.0   0.0   0.0   0.0   0.0        RR
12  0.0     0   0.0   0.0   0.0   0.0   0.0   0.0        QU
13  0.0     0   0.0   0.0   0.0   0.0   0.0   0.0        QU

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM