[英]Python Pandas Dataframe: change a NaN cell value with a different column from previous row
import pandas as pd
import numpy as np
data = np.array([['', 'Col1', 'Col2', 'Col3'],
['Row1', 1, 2, 3],
['Row2', np.nan, 5, 6],
['Row3', 7, 8, 9]
])
df = pd.DataFrame(data=data[1:, 1:],
index=data[1:,0],
columns=data[0,1:])
OutPut:
Col1 Col2 Col3
Row1 1 2 3
Row2 nan 5 6
Row3 7 8 9
I would like to loop through the dataframe and replace the NaN value in Row2['Col1'] (current row in loop) with the value in Row1['Col3'] (different column from the previous record in loop) 我想遍历数据框,并将Row2 ['Col1'](循环中的当前行)中的NaN值替换为Row1 ['Col3'](与循环中的前一记录不同的列)中的值
One way you can do this is to use stack
, ffill
, and unstack
: 你可以做到这一点的方法之一是使用
stack
, ffill
,并unstack
:
df.stack(dropna=False).ffill().unstack()
Output: 输出:
Col1 Col2 Col3
Row1 1 2 3
Row2 3 5 6
Row3 7 8 9
You have one more thing need to solve before replace nan
: 在替换
nan
之前,您还需要解决另一件事:
1st: You are using array , array do not accept join type , which mean your nan here is not np.nan any more, it is 'nan' 1st:您正在使用array,array不接受join type,这意味着您此处的nan不再是np.nan,它是'nan'
df.applymap(type)
Out[1244]:
Col1 Col2 Col3
Row1 <class 'str'> <class 'str'> <class 'str'>
Row2 <class 'str'> <class 'str'> <class 'str'>
Row3 <class 'str'> <class 'str'> <class 'str'>
df=df.replace('nan',np.nan)
2nd, I am using np.roll
+ combine_first
to fill the nan
2,我正在使用
np.roll
+ combine_first
来填充nan
df.combine_first(pd.DataFrame(np.roll(np.concatenate(df.values),1).reshape(3,3),index=df.index,columns=df.columns))
Out[1240]:
Col1 Col2 Col3
Row1 1 2 3
Row2 3 5 6
Row3 7 8 9
I apologize for not posting the actual data from my dataset so here it is: 对于无法发布数据集的实际数据,我深表歉意,这里是:
Open High Low Last Change Settle Volume
Date
2017-05-22 51.97 52.28 51.73 **51.96** 0.49 52.05 70581.0
2017-05-23 **NaN** 52.44 51.61 52.31 0.24 52.35 9003.0
2017-05-24 52.34 52.63 51.91 52.05 0.23 52.12 11678.0
2017-05-25 52.25 52.61 49.49 49.59 2.28 49.84 19721.0
2017-05-26 49.82 50.73 49.34 50.73 0.82 50.66 11214.0
I needed the script to find any 'NaN's in the 'Open' column and replace it with the 'Last' from the previous row.(highlighted here by double asterisks). 我需要该脚本在'打开'栏中找到任何'NaN '并将其替换为上一行中的'最后一个' (此处以双星号突出显示)。
I thank all for the posts, however, this is what ended up working: 我感谢所有帖子,但是,这些最终奏效了:
missing = df['Open'].isnull() # get nans
new_open = df['Open'].copy() # make copy
# loop missing and test against a True value
# if so, get the 'Last' value at index and
# populate new_open value at index
for i in range(missing.shape[0]):
if missing[i] == True:
new_open.iloc[i] = df['Last'][i-1]
# replace the 'Open' values with new 'Open' values
df['Open'] = new_open
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.