在pandas數據幀中查找連續的Nans

Question

我想在我的數據框列中找到連續的nans，比如

>>> df = pd.DataFrame([[np.nan, 2, np.nan],
...                    [3, 4, np.nan],
...                    [np.nan, np.nan, np.nan],
...                    [np.nan, 3, np.nan]],
...                    columns=list('ABC'))
>>> df
     A    B   C 
0  NaN  2.0 NaN 
1  3.0  4.0 NaN 
2  NaN  NaN NaN 
3  NaN  3.0 NaN

會給

>>> df
     A    B   C 
0  1.0  NaN 4.0 
1  NaN  NaN 4.0 
2  2.0  1.0 4.0 
3  2.0  NaN 4.0

Answer 1

使用：

a = df.isnull()
b = a.ne(a.shift()).cumsum().apply(lambda x: x.map(x.value_counts())).where(a)
print (b)
     A    B  C
0  1.0  NaN  4
1  NaN  NaN  4
2  2.0  1.0  4
3  2.0  NaN  4

詳情：

#unique consecutive values
print (a.ne(a.shift()).cumsum())
   A  B  C
0  1  1  1
1  2  1  1
2  3  2  1
3  3  3  1

#count values per columns and map
print (a.ne(a.shift()).cumsum().apply(lambda x: x.map(x.value_counts())))
   A  B  C
0  1  2  4
1  1  2  4
2  2  1  4
3  2  1  4

#add NaNs by mask a
print (a.ne(a.shift()).cumsum().apply(lambda x: x.map(x.value_counts())).where(a))
     A    B  C
0  1.0  NaN  4
1  NaN  NaN  4
2  2.0  1.0  4
3  2.0  NaN  4

一欄替代方案：

a = df['A'].isnull()
b = a.ne(a.shift()).cumsum()
c = b.map(b.value_counts()).where(a)

print (c)
0    1.0
1    NaN
2    2.0
3    2.0
Name: A, dtype: float64

Answer 2

IIUC ... groupby + mask + isnull

df.apply(lambda x :x.groupby(x.isnull().diff().ne(0).cumsum()).transform(len).mask(~x.isnull()))
Out[751]: 
     A    B    C
0  1.0  NaN  4.0
1  NaN  NaN  4.0
2  2.0  1.0  4.0
3  2.0  NaN  4.0

對於一列

df.A.groupby(df.A.isnull().diff().ne(0).cumsum()).transform(len).mask(~df.A.isnull())
Out[756]: 
0    1.0
1    NaN
2    2.0
3    2.0
Name: A, dtype: float64

Answer 3

不確定這是不是太優雅，但我是如何做到的：

def f(ds):
    ds = ds.isnull()
    splits = np.split(ds, np.where(ds == False)[0])
    counts = [np.sum(v) for v in splits]
    return pd.concat([pd.Series(split).replace({False: np.nan, True: count}) 
                      for split, count in zip(splits, counts)])

df.apply(lambda x: f(x))

說明：

# Binarize the array
ds = ds.isnull()

# Split the array where we have False (former nan values)
splits = np.split(ds, np.where(ds == False)[0])

# Now just count the number of True values
counts = [np.sum(v) for v in splits]

# Concatenate series that contains the requested values
pd.concat([pd.Series(split).replace({False: np.nan, True: count}) 
           for split, count in zip(splits, counts)])

在pandas數據幀中查找連續的Nans

問題描述

3 個解決方案

解決方案1
3 已采納 2017-12-05 20:07:17

解決方案2
2 2017-12-05 20:02:20

解決方案3
2 2017-12-05 20:31:04

在pandas數據幀中查找連續的Nans

問題描述

3 個解決方案

解決方案1 3 已采納 2017-12-05 20:07:17

解決方案2 2 2017-12-05 20:02:20

解決方案3 2 2017-12-05 20:31:04

解決方案1
3 已采納 2017-12-05 20:07:17

解決方案2
2 2017-12-05 20:02:20

解決方案3
2 2017-12-05 20:31:04