简体   繁体   English

使用NaN将熊猫系列中的连续零值的负值和“块”替换为第一个正值

[英]Replace negative values and 'blocks" of consecutive zeros up to first positive value in pandas series with NaN

I have a pandas dataframe and I would like to identify all negative values and replace them with NaN. 我有一个熊猫数据框,我想识别所有负值并将其替换为NaN。 Additionally, all zeros that follow a negative value should be replaced with NaN as well, up until the first positive value occurs. 此外,跟随负值的所有零也应替换为NaN,直到出现第一个正值。

I think it should be possible to achieve my goal using a for loop over all negative values in the data frame. 我认为应该可以对数据帧中的所有负值使用for循环来实现我的目标。

For example, for the negative value with index label 1737, I could use something like this: 例如,对于索引标签为1737的负值,我可以使用如下所示的内容:

# list indexes that follow the negative value
indexes = df['counter_diff'].loc[1737:,]
# find first value greater than zero
first_index = next(x for x, val in enumerate(indexes) if val > 0)

And then fill the values from index 1737 up to first_index with NaN. 然后使用NaN填充从索引1737到first_index的值。

However, my dataframe is very large, so I was wondering whether it might be possible to come up with a more computiationally efficient method that leverages pandas. 但是,我的数据帧很大,因此我想知道是否有可能提出一种利用熊猫计算效率更高的方法。

This is an example of the input: 这是输入的示例:

# input column
In[]
pd.Series({0 : 1, 2 : 3, 3 : -1, 4 : 0, 5 : 0, 7 : 1, 9 : 3, 10 : 0, 11 : -2, 14 : 1})

Out[]
0     1
2     3
3    -1
4     0
5     0
7     1
9     3
10    0
11   -2
14    1
dtype: int64

And the desired output: 和所需的输出:

# desired output
In[]
pd.Series({0 : 1, 2 : 3, 3 : np.nan, 4 : np.nan, 5:np.nan, 7:1, 9:3, 10:0, 11 : np.nan, 14:1})

Out[]
0     1.0
2     3.0
3     NaN
4     NaN
5     NaN
7     1.0
9     3.0
10    0.0
11    NaN
14    1.0
dtype: float64

Any help would be appreciated! 任何帮助,将不胜感激!

You could mask all 0s and forward fill them with ffill , and check which values in the series are less than 0 . 你可以mask所有的0s和转发与填充它们ffill ,检查小于该值在系列0 Then use the resulting boolean Series to mask the original Series: 然后使用生成的布尔系列遮罩原始系列:

s = pd.Series({0 : 1, 2 : 3, 3 : -1, 4 : 0, 5 : 0, 7 : 1, 9 : 3, 10 : 0, 11 : -2, 14 : 1})

s.mask(s.mask(s.eq(0)).ffill().lt(0))

0     1.0
2     3.0
3     NaN
4     NaN
5     NaN
7     1.0
9     3.0
10    0.0
11    NaN
14    1.0
dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM