简体   繁体   中英

Pandas replace values in dataframe timeseries

I have a pandas dataframe df with pandas.tseries.index.DatetimeIndex as index.

The data is like this:

Time                 Open  High Low   Close Volume
2007-04-01 21:02:00 1.968 2.389 1.968 2.389 18.300000
2007-04-01 21:03:00 157.140 157.140 157.140 157.140 2.400000

....

I want to replace one datapoint, lets day 2.389 in column Close with NaN:

In: df["Close"].replace(2.389, np.nan)
Out: 2007-04-01 21:02:00      2.389
     2007-04-01 21:03:00    157.140

Replace did not change 2.389 to NaN. Whats wrong?

replace might not work with floats because the floating point representation you see in the repr of the DataFrame might not be the same as the underlying float. For example, the actual Close value might be:

In [141]: df = pd.DataFrame({'Close': [2.389000000001]})

yet the repr of df looks like:

In [142]: df
Out[142]: 
   Close
0  2.389

So instead of checking for float equality, it is usually better to check for closeness:

In [150]: import numpy as np
In [151]: mask = np.isclose(df['Close'], 2.389)

In [152]: mask
Out[152]: array([ True], dtype=bool)

You can then use the boolean mask to select and change the desired values:

In [145]: df.loc[mask, 'Close'] = np.nan

In [146]: df
Out[146]: 
   Close
0    NaN

You need to assign the result to df['Close'] or pass param inplace=True : df['Close'].replace(2.389, np.NaN, inplace=True)

eg:

In [5]:

df['Close'] = df['Close'].replace(2.389, np.NaN)
df['Close']
Out[5]:
0      2.389
1    157.140
Name: Close, dtype: float64

Most pandas operations return a copy and some accept the param inplace .

Check the docs: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.replace.html#pandas.Series.replace

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM