[英]Create and fill rows of zeros with pandas
I have two dataframes, 'inp' and 'IndiRelat'.我有两个数据框,“inp”和“IndiRelat”。 The 'inp' are the entries of certain data.
“inp”是某些数据的条目。
import pandas as pd
inp = [[2, 'cvt' , -3, 5, 17, -2, -9, -0.2, 'RL'],
[2, 'cv' , 0, 0, 0, 0, 0, 0, 'LL'],
[2, 'sope' , 0, 0, 0, 0, 0, 0, 'SD'],
[2, 'wix+' ,-13,-13, 2, 1,-62, -0.5, 'WI'],
[2, 'wix-' , 0, 16, 6, 13, 0, 0.3, 'WI'],
[4, 'sope' ,-42, 0, 29, 0, 0, -13, 'SD'],
[4, 'cv' , 0, 0, 0, 0, 0, 0, 'LL'],
[4, 'cvt' , 0, 0, 0, 0, 0, -1, 'RL'],
[4, 'wix+' ,-18, -2, 19, 19, 3, -64, 'WI'],
[4, 'wix-' , 0,-30, -2, -2, 32, 0, 'WI']]
inp = pd.DataFrame(data = inp, columns = ['Key','Descr', 'C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'Indicator'])
print(inp['Key'])
print(inp['Indicator'])
IndiRelat = ['SD', 'LL', 'RL', 'SS', 'RR', 'WI', 'WI', 'WI', 'WI', 'QU', 'QU']
IndiRelat = pd.DataFrame(IndiRelat)
I have created an 'inp' DataFrame, in the there is data for each key 'inp ['Key']' which relate to an indicator 'inp['indicator']'.我创建了一个 'inp' DataFrame,其中每个键 'inp ['Key']' 都有与指标 'inp['indicator']' 相关的数据。
The idea is to relate this with a second DataFrame 'IndiRelat, and create rows of zeros in case the 'IndiRelat' data is not in 'inp['indicator']'.这个想法是将此与第二个 DataFrame 'IndiRelat 相关联,并创建零行以防 'IndiRelat' 数据不在 'inp['indicator']' 中。
What I seek to get is something like that.我想要得到的就是这样的东西。
Index Descr C1 C2 C3 C4 C5 C6 Indicator
0 2 cvt -3 5 17 -2 -9 -0.2 RL
1 2 cv 0 0 0 0 0 0 LL
2 2 sope 0 0 0 0 0 0 SD
3 2 wix+ -13 -13 2 1 -62 -0.5 WI
4 2 wix- 0 16 6 13 0 0.3 WI
5 2 0 0 0 0 0 0 0 WI
6 2 0 0 0 0 0 0 0 WI
7 2 0 0 0 0 0 0 0 QU
8 2 0 0 0 0 0 0 0 QU
9 2 0 0 0 0 0 0 0 SS
10 2 0 0 0 0 0 0 0 RR
11 4 cvt -42 0 29 0 0 -13 RL
12 4 cv 0 0 0 0 0 0 LL
13 4 sope 0 0 0 0 0 -1 SD
14 4 wix+ -18 -2 19 19 3 -64 WI
15 4 wix- 0 -30 -2 -2 32 0 WI
16 4 0 0 0 0 0 0 0 WI
17 4 0 0 0 0 0 0 0 WI
18 4 0 0 0 0 0 0 0 QU
19 4 0 0 0 0 0 0 0 QU
20 4 0 0 0 0 0 0 0 SS
21 4 0 0 0 0 0 0 0 RR
I would greatly appreciate if you can help me with the idea and suggestions to get it, greetings.如果你能帮助我提出想法和建议,我将不胜感激,问候。
We try to do the cumcount
create the unique key then we do reindex
我们尝试做
cumcount
创建唯一键然后我们做reindex
inp['new'] = inp.groupby(['Key','Indicator']).cumcount()
IndiRelat[1] = IndiRelat.groupby(0).cumcount()
IndiRelat.columns = ['Indicator','new']
out = inp.set_index(['Key','Indicator','new']).unstack(level=0).reindex(pd.MultiIndex.from_frame(IndiRelat),fill_value=0).stack().reset_index().sort_values('Key')
out
Out[93]:
Indicator new Key Descr C1 C2 C3 C4 C5 C6
0 SD 0 2 sope 0 0 0 0 0 0.0
18 QU 0 2 0 0 0 0 0 0 0.0
16 WI 3 2 0 0 0 0 0 0 0.0
14 WI 2 2 0 0 0 0 0 0 0.0
12 WI 1 2 wix- 0 16 6 13 0 0.3
20 QU 1 2 0 0 0 0 0 0 0.0
8 RR 0 2 0 0 0 0 0 0 0.0
10 WI 0 2 wix+ -13 -13 2 1 -62 -0.5
6 SS 0 2 0 0 0 0 0 0 0.0
4 RL 0 2 cvt -3 5 17 -2 -9 -0.2
2 LL 0 2 cv 0 0 0 0 0 0.0
9 RR 0 4 0 0 0 0 0 0 0.0
5 RL 0 4 cvt 0 0 0 0 0 -1.0
11 WI 0 4 wix+ -18 -2 19 19 3 -64.0
13 WI 1 4 wix- 0 -30 -2 -2 32 0.0
3 LL 0 4 cv 0 0 0 0 0 0.0
15 WI 2 4 0 0 0 0 0 0 0.0
17 WI 3 4 0 0 0 0 0 0 0.0
1 SD 0 4 sope -42 0 29 0 0 -13.0
19 QU 0 4 0 0 0 0 0 0 0.0
7 SS 0 4 0 0 0 0 0 0 0.0
21 QU 1 4 0 0 0 0 0 0 0.0
An outer merge ought to do the trick.外部合并应该可以解决问题。 As per @BENY's comment and answer, we need to address non-uniqueness of the key:
根据@BENY 的评论和回答,我们需要解决密钥的非唯一性问题:
>>> df2 = IndiRelat.rename(columns={0: 'Indicator'})
>>> df2['dedup_indic'] = df2.groupby('Indicator').cumcount()
>>> df = inp.join(inp.groupby('Indicator').cumcount().rename('dedup_indic'))\
... .merge(df2, how='outer')
>>> df
Key Descr C1 C2 C3 C4 C5 C6 Indicator dedup_indic
0 2.0 cvt -3.0 5.0 17.0 -2.0 -9.0 -0.2 RL 0
1 2.0 cv 0.0 0.0 0.0 0.0 0.0 0.0 LL 0
2 2.0 sope 0.0 0.0 0.0 0.0 0.0 0.0 SD 0
3 2.0 wix+ -13.0 -13.0 2.0 1.0 -62.0 -0.5 WI 0
4 2.0 wix- 0.0 16.0 6.0 13.0 0.0 0.3 WI 1
5 4.0 sope -42.0 0.0 29.0 0.0 0.0 -13.0 SD 1
6 4.0 cv 0.0 0.0 0.0 0.0 0.0 0.0 LL 1
7 4.0 cvt 0.0 0.0 0.0 0.0 0.0 -1.0 RL 1
8 4.0 wix+ -18.0 -2.0 19.0 19.0 3.0 -64.0 WI 2
9 4.0 wix- 0.0 -30.0 -2.0 -2.0 32.0 0.0 WI 3
10 NaN NaN NaN NaN NaN NaN NaN NaN SS 0
11 NaN NaN NaN NaN NaN NaN NaN NaN RR 0
12 NaN NaN NaN NaN NaN NaN NaN NaN QU 0
13 NaN NaN NaN NaN NaN NaN NaN NaN QU 1
>>> df.fillna(0).drop(columns=['dedup_indic'])
Key Descr C1 C2 C3 C4 C5 C6 Indicator
0 2.0 cvt -3.0 5.0 17.0 -2.0 -9.0 -0.2 RL
1 2.0 cv 0.0 0.0 0.0 0.0 0.0 0.0 LL
2 2.0 sope 0.0 0.0 0.0 0.0 0.0 0.0 SD
3 2.0 wix+ -13.0 -13.0 2.0 1.0 -62.0 -0.5 WI
4 2.0 wix- 0.0 16.0 6.0 13.0 0.0 0.3 WI
5 4.0 sope -42.0 0.0 29.0 0.0 0.0 -13.0 SD
6 4.0 cv 0.0 0.0 0.0 0.0 0.0 0.0 LL
7 4.0 cvt 0.0 0.0 0.0 0.0 0.0 -1.0 RL
8 4.0 wix+ -18.0 -2.0 19.0 19.0 3.0 -64.0 WI
9 4.0 wix- 0.0 -30.0 -2.0 -2.0 32.0 0.0 WI
10 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 SS
11 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 RR
12 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 QU
13 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 QU
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.