[英]Filling the missing rows in pandas dataframe
data = {
'node1': [1, 1,1, 2,2,5],
'node2': [8,16,22,5,25,10],
'weight': [1,1,1,1,1,1], }
df = pd.DataFrame(data, columns = ['node1','node2','weight'])
df2=df.assign(Cu=df.groupby('node1').cumcount()).set_index('Cu').groupby('node1') \
.apply(lambda x : x['node2']).unstack('Cu').fillna(np.nan)
Output: 输出:
1 8.0 16.0 22.0
2 5.0 25.0 0.0
5 10.0 0.0 0.0
This the output I am gettting but I require the output: 这是我正在获取的输出,但我需要输出:
1 8 16 22
2 5 25 0
3 0 0 0
4 0 0 0
5 10 0 0
The rows which are missing in the data like the 3,4 should have the columns as zeros 像3,4这样的数据中缺少的行的列应为零
In [15]: idx = np.arange(df.node1.min(), df.node1.max()+1)
In [16]: df.pivot_table(index='node1',
columns=df.groupby('node1').cumcount(),
values='node2',
fill_value=0) \
.reindex(idx) \
.fillna(0)
Out[16]:
0 1 2
node1
1 8.0 16.0 22.0
2 5.0 25.0 0.0
3 0.0 0.0 0.0
4 0.0 0.0 0.0
5 10.0 0.0 0.0
Here are few ways of doing it. 这里有几种方法。
Option 1 选项1
In [36]: idx = np.arange(df.node1.min(), df.node1.max()+1)
In [37]: df.groupby('node1')['node2'].apply(list).apply(pd.Series).reindex(idx).fillna(0)
Out[37]:
0 1 2
node1
1 8.0 16.0 22.0
2 5.0 25.0 0.0
3 0.0 0.0 0.0
4 0.0 0.0 0.0
5 10.0 0.0 0.0
Option 2 选项2
In [39]: (df.groupby('node1')['node2'].apply(lambda x: pd.Series(x.values))
.unstack().reindex(idx).fillna(0))
Out[39]:
0 1 2
node1
1 8.0 16.0 22.0
2 5.0 25.0 0.0
3 0.0 0.0 0.0
4 0.0 0.0 0.0
5 10.0 0.0 0.0
Option 3 选项3
In [55]: pd.DataFrame.from_dict(
{i: x.values for i, x in df.groupby('node1')['node2']},
orient='index').reindex(idx).fillna(0)
Out[55]:
0 1 2
1 8.0 16.0 22.0
2 5.0 25.0 0.0
3 0.0 0.0 0.0
4 0.0 0.0 0.0
5 10.0 0.0 0.0
And, measure the efficiency, readability based on your usecase. 并且,根据您的用例来衡量效率和可读性。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.