缺失数据，在 Pandas 中插入行并用 NAN 填充

Question

I'm new to Python and Pandas so there might be a simple solution which I don't see.我是 Python 和 Pandas 的新手，所以可能有一个我没有看到的简单解决方案。

I have a number of discontinuous datasets which look like this:我有许多不连续的数据集，如下所示：

ind A    B  C  
0   0.0  1  3  
1   0.5  4  2  
2   1.0  6  1  
3   3.5  2  0  
4   4.0  4  5  
5   4.5  3  3

I now look for a solution to get the following:我现在寻找解决方案来获得以下内容：

ind A    B  C  
0   0.0  1  3  
1   0.5  4  2  
2   1.0  6  1  
3   1.5  NAN NAN  
4   2.0  NAN NAN  
5   2.5  NAN NAN  
6   3.0  NAN NAN  
7   3.5  2  0  
8   4.0  4  5  
9   4.5  3  3

The problem is,that the gap in A varies from dataset to dataset in position and length...问题是，A 中的差距在位置和长度上因数据集而异......

Answer 1

set_index and reset_index are your friends. set_index和reset_index是你的朋友。

df = DataFrame({"A":[0,0.5,1.0,3.5,4.0,4.5], "B":[1,4,6,2,4,3], "C":[3,2,1,0,5,3]})

First move column A to the index:首先将列 A 移动到索引：

In [64]: df.set_index("A")
Out[64]: 
     B  C
 A        
0.0  1  3
0.5  4  2
1.0  6  1
3.5  2  0
4.0  4  5
4.5  3  3

Then reindex with a new index, here the missing data is filled in with nans.然后用新索引重新索引，这里缺失的数据用 nans 填充。 We use the Index object since we can name it;我们使用Index对象，因为我们可以命名它； this will be used in the next step.这将在下一步中使用。

In [66]: new_index = Index(arange(0,5,0.5), name="A")
In [67]: df.set_index("A").reindex(new_index)
Out[67]: 
      B   C
0.0   1   3
0.5   4   2
1.0   6   1
1.5 NaN NaN
2.0 NaN NaN
2.5 NaN NaN
3.0 NaN NaN
3.5   2   0
4.0   4   5
4.5   3   3

Finally move the index back to the columns with reset_index .最后将索引移回带有reset_index的列。 Since we named the index, it all works magically:由于我们为索引命名，所以一切都神奇地工作：

In [69]: df.set_index("A").reindex(new_index).reset_index()
Out[69]: 
       A   B   C
0    0.0   1   3
1    0.5   4   2
2    1.0   6   1
3    1.5 NaN NaN
4    2.0 NaN NaN
5    2.5 NaN NaN
6    3.0 NaN NaN
7    3.5   2   0
8    4.0   4   5
9    4.5   3   3

Answer 2

Using the answer by EdChum above, I created the following function使用上面 EdChum 的答案，我创建了以下函数

def fill_missing_range(df, field, range_from, range_to, range_step=1, fill_with=0):
    return df\
      .merge(how='right', on=field,
            right = pd.DataFrame({field:np.arange(range_from, range_to, range_step)}))\
      .sort_values(by=field).reset_index().fillna(fill_with).drop(['index'], axis=1)

Example usage:用法示例：

fill_missing_range(df, 'A', 0.0, 4.5, 0.5, np.nan)

Answer 3

In this case I am overwriting your A column with a newly generated dataframe and merging this to your original df, I then resort it:在这种情况下，我使用新生成的数据框覆盖您的 A 列并将其合并到您的原始 df 中，然后我使用它：

    In [177]:

df.merge(how='right', on='A', right = pd.DataFrame({'A':np.arange(df.iloc[0]['A'], df.iloc[-1]['A'] + 0.5, 0.5)})).sort(columns='A').reset_index().drop(['index'], axis=1)
Out[177]:
     A   B   C
0  0.0   1   3
1  0.5   4   2
2  1.0   6   1
3  1.5 NaN NaN
4  2.0 NaN NaN
5  2.5 NaN NaN
6  3.0 NaN NaN
7  3.5   2   0
8  4.0   4   5
9  4.5   3   3

So in the general case you can adjust the arange function which takes a start and end value, note I added 0.5 to the end as ranges are open closed, and pass a step value.因此，在一般情况下，您可以调整带有开始值和结束值的arange函数，请注意，当范围打开关闭时，我在末尾添加了 0.5，并传递了一个步长值。

A more general method could be like this:更通用的方法可能是这样的：

In [197]:

df = df.set_index(keys='A', drop=False).reindex(np.arange(df.iloc[0]['A'], df.iloc[-1]['A'] + 0.5, 0.5))
df.reset_index(inplace=True) 
df['A'] = df['index']
df.drop(['A'], axis=1, inplace=True)
df.reset_index().drop(['level_0'], axis=1)
Out[197]:
   index   B   C
0    0.0   1   3
1    0.5   4   2
2    1.0   6   1
3    1.5 NaN NaN
4    2.0 NaN NaN
5    2.5 NaN NaN
6    3.0 NaN NaN
7    3.5   2   0
8    4.0   4   5
9    4.5   3   3

Here we set the index to column A but don't drop it and then reindex the df using the arange function.在这里，我们将索引设置为A列A但不删除它，然后使用arange函数重新索引 df。

Answer 4

This question was asked a long time ago, but I have a simple solution that's worth mentioning.很久以前就有人问过这个问题，但我有一个值得一提的简单解决方案。 You can simply use NumPy's NaN.您可以简单地使用 NumPy 的 NaN。 For instance:例如：

import numpy as np
df[i,j] = np.NaN

will do the trick.会做的伎俩。

缺失数据，在 Pandas 中插入行并用 NAN 填充

问题描述

4 个解决方案

解决方案1
44 已采纳 2014-09-18 14:58:28

解决方案2
5 2017-03-24 14:07:05

解决方案3
2 2014-09-18 10:28:40

解决方案4
-1 2020-07-28 19:14:32

缺失数据，在 Pandas 中插入行并用 NAN 填充

问题描述

4 个解决方案

解决方案1 44 已采纳 2014-09-18 14:58:28

解决方案2 5 2017-03-24 14:07:05

解决方案3 2 2014-09-18 10:28:40

解决方案4 -1 2020-07-28 19:14:32

解决方案1
44 已采纳 2014-09-18 14:58:28

解决方案2
5 2017-03-24 14:07:05

解决方案3
2 2014-09-18 10:28:40

解决方案4
-1 2020-07-28 19:14:32