[英]pandas df.fillna() not replacing na values
I have a dataframe that looks like this (For clarity: This represents a df with 5 rows and 8 columns) :我有一个看起来像这样的 dataframe (为清楚起见:这代表一个 df 有 5 行和 8 列) :
BTC-USD_close BTC-USD_volume LTC-USD_close LTC-USD_volume \
time
1528968660 6489.549805 0.587100 96.580002 9.647200
1528968720 6487.379883 7.706374 96.660004 314.387024
1528968780 6479.410156 3.088252 96.570000 77.129799
1528968840 6479.410156 1.404100 96.500000 7.216067
1528968900 6479.979980 0.753000 96.389999 524.539978
BCH-USD_close BCH-USD_volume ETH-USD_close ETH-USD_volume
time
1528968660 871.719971 5.675361 NaN NaN
1528968720 870.859985 26.856577 486.01001 26.019083
1528968780 870.099976 1.124300 486.00000 8.449400
1528968840 870.789978 1.749862 485.75000 26.994646
1528968900 870.000000 1.680500 486.00000 77.355759
And I would like to replace the nan-values in the ETH-USD_close and ETH-USD_volume column.我想替换 ETH-USD_close 和 ETH-USD_volume 列中的 nan-values。 However, when i call
df.fillna(method='ffill', inplace=True)
, nothing seems to happen;但是,当我调用
df.fillna(method='ffill', inplace=True)
时,似乎什么也没发生; the missing values are still there and nothing changes in the columns when I step through the program with a debugger.当我使用调试器逐步执行程序时,缺失的值仍然存在,并且列中没有任何变化。
When i use df.isna()
to check whether my nan values are correctly interpreted by pandas, this does seem to be the case;当我使用
df.isna()
检查我的 nan 值是否被 pandas 正确解释时,情况似乎确实如此; check the output of the first few rows when I check by print(df.isna())
:当我通过
print(df.isna())
检查时,检查前几行的 output :
BTC-USD_close BTC-USD_volume LTC-USD_close LTC-USD_volume \
time
1528968660 False False False False
1528968720 False False False False
1528968780 False False False False
1528968840 False False False False
1528968900 False False False False
BCH-USD_close BCH-USD_volume ETH-USD_close ETH-USD_volume
time
1528968660 False False True True
1528968720 False False False False
1528968780 False False False False
1528968840 False False False False
1528968900 False False False False
A call like df.dropna(inplace=True)
does remove the entire row, but this is not what I want.像
df.dropna(inplace=True)
这样的调用确实会删除整行,但这不是我想要的。 Any suggestions?有什么建议么?
EDIT: In case anyone wants to reproduce the problem, one can download the data from https://pythonprogramming.net/static/downloads/machine-learning-data/crypto_data.zip , unzip it and run the following code in the same directory:编辑:如果有人想重现该问题,可以从https://pythonprogramming.net/static/downloads/machine-learning-data/crypto_data.zip下载数据,解压缩并在同一目录中运行以下代码:
import pandas as pd
#Initialize empty df
main_df = pd.DataFrame()
ratios = ["BTC-USD", "LTC-USD", "BCH-USD", "ETH-USD"]
for ratio in ratios:
#SET CORRECT PATH HERE
dataset = f'crypto_data/{ratio}.csv'
#Use f-strings so we know which close/volume is which
df_ratio = pd.read_csv(dataset, names=['time', 'low', 'high', 'open', f"{ratio}_close", f"{ratio}_volume"])
#Set time as index so we can join them on this shared time
df_ratio.set_index("time", inplace=True)
#ignore the other columns besides price and volume
df_ratio = df_ratio[[f"{ratio}_close", f"{ratio}_volume"]]
if main_df.empty:
main_df = df_ratio
else:
main_df = main_df.join(df_ratio)
main_df.fillna(method='ffill', inplace=True) #THIS DOESN'T SEEM TO WORK
Ah.啊。
You cannot ffill
a NaN
value if it is the first value of a series: it has no previous value.如果它是系列的第一个值,则不能
ffill
NaN
值:它没有先前的值。
Using .ffill().bfill()
could solve this but might be creating false data.使用
.ffill().bfill()
可以解决这个问题,但可能会创建错误数据。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.