填充熊猫数据框中的缺失行

Question

data = { 
    'node1': [1, 1,1, 2,2,5],
 'node2': [8,16,22,5,25,10],
 'weight': [1,1,1,1,1,1], }
df = pd.DataFrame(data, columns = ['node1','node2','weight'])

df2=df.assign(Cu=df.groupby('node1').cumcount()).set_index('Cu').groupby('node1') \
      .apply(lambda x : x['node2']).unstack('Cu').fillna(np.nan)

Output: 输出：

1       8.0    16.0     22.0
2       5.0    25.0     0.0
5       10.0   0.0      0.0

This the output I am gettting but I require the output: 这是我正在获取的输出，但我需要输出：

1  8   16  22    
2  5   25  0    
3  0   0   0    
4  0   0   0    
5  10  0   0

The rows which are missing in the data like the 3,4 should have the columns as zeros 像3,4这样的数据中缺少的行的列应为零

Answer 1

In [15]: idx = np.arange(df.node1.min(), df.node1.max()+1)

In [16]: df.pivot_table(index='node1',
                        columns=df.groupby('node1').cumcount(), 
                        values='node2',
                        fill_value=0) \
           .reindex(idx) \
           .fillna(0)
Out[16]:
          0     1     2
node1
1       8.0  16.0  22.0
2       5.0  25.0   0.0
3       0.0   0.0   0.0
4       0.0   0.0   0.0
5      10.0   0.0   0.0

Answer 2

Here are few ways of doing it. 这里有几种方法。

Option 1 选项1

In [36]: idx = np.arange(df.node1.min(), df.node1.max()+1)

In [37]: df.groupby('node1')['node2'].apply(list).apply(pd.Series).reindex(idx).fillna(0)
Out[37]:
          0     1     2
node1
1       8.0  16.0  22.0
2       5.0  25.0   0.0
3       0.0   0.0   0.0
4       0.0   0.0   0.0
5      10.0   0.0   0.0

Option 2 选项2

In [39]: (df.groupby('node1')['node2'].apply(lambda x: pd.Series(x.values))
            .unstack().reindex(idx).fillna(0))
Out[39]:
          0     1     2
node1
1       8.0  16.0  22.0
2       5.0  25.0   0.0
3       0.0   0.0   0.0
4       0.0   0.0   0.0
5      10.0   0.0   0.0

Option 3 选项3

In [55]: pd.DataFrame.from_dict(
               {i: x.values  for i, x in df.groupby('node1')['node2']},
                orient='index').reindex(idx).fillna(0)
Out[55]:
      0     1     2
1   8.0  16.0  22.0
2   5.0  25.0   0.0
3   0.0   0.0   0.0
4   0.0   0.0   0.0
5  10.0   0.0   0.0

And, measure the efficiency, readability based on your usecase. 并且，根据您的用例来衡量效率和可读性。

填充熊猫数据框中的缺失行

问题描述

2 个解决方案

解决方案1
3 2017-10-07 12:59:36

解决方案2
3 已采纳 2017-10-07 13:07:32

填充熊猫数据框中的缺失行

问题描述

2 个解决方案

解决方案1 3 2017-10-07 12:59:36

解决方案2 3 已采纳 2017-10-07 13:07:32

解决方案1
3 2017-10-07 12:59:36

解决方案2
3 已采纳 2017-10-07 13:07:32