[英]How to specify the method for fillna in pandas
Suppose I have a dataframe like: 假设我有一个像这样的数据框:
1. A B
2. a1 b1
3. a2 NaN
4. a3 NaN
How do I fill NaNs with say (b1/a1)*a2
and (b1/a1)*a3
如何用
(b1/a1)*a2
和(b1/a1)*a3
填充NaN
I guess something like df.apply(pd.Series.my_function)
has to be used. 我猜想像
df.apply(pd.Series.my_function)
类的东西必须使用。 Could someone help me out with this? 有人可以帮我这个忙吗?
Edit: My representation must have been misleading. 编辑:我的陈述一定是令人误解的。 The NaNs can come anywhere in the dataframe and I want to fill NaNs with the (closest non-NaN B/closest non-NaN A)*a2 .
NaN可以在数据帧中的任何位置出现,我想用(最接近的非NaN B /最接近的非NaN A)* a2来填充NaN。
If I understand correctly, you're looking for something like: 如果我理解正确,那么您正在寻找类似的东西:
>>> df = pd.DataFrame([[i, 20+i] for i in range(10)], columns=['a','b'])
>>> df['b'][[3,4,5,8]] = np.nan
>>> print df
a b
0 0 20
1 1 21
2 2 22
3 3 NaN
4 4 NaN
5 5 NaN
6 6 26
7 7 27
8 8 NaN
9 9 29
>>> nan_indices = df.index[np.logical_not(df['b'] < np.inf)].values
>>> for nan_index in nan_indices:
... last_non_nan_before_nan = np.where(df['b'][:nan_index].values < np.inf)[0][-1]
... first_non_nan_after_nan = nan_index + 1 + (np.where(df['b'][nan_index+1:] < np.inf))[0][0]
... if nan_index - last_non_nan_before_nan >= first_non_nan_after_nan - nan_index:
... index_of_closest_non_nan_value = first_non_nan_after_nan
... else:
... index_of_closest_non_nan_value = last_non_nan_before_nan
... df['b'][nan_index] = df['b'][index_of_closest_non_nan_value] / \
... df['a'][index_of_closest_non_nan_value] * \
... df['a'][nan_index]
...
>>> print df
a b
0 0 20.000000
1 1 21.000000
2 2 22.000000
3 3 33.000000
4 4 44.000000
5 5 21.666667
6 6 26.000000
7 7 27.000000
8 8 25.777778
9 9 29.000000
You'll have to make up for edge cases (if b
is equal to 0), if there is a NaN
at the beginning or the end of the DataFrame, etc.). 您必须弥补边缘情况(如果
b
等于0),DataFrame的开头或结尾处是否存在NaN
等)。
df = pd.DataFrame(np.reshape(np.arange(10), (5,2)), columns = ['A', 'B'])
df.iloc[2,1] = np.nan
df.iloc[3,1] = np.nan
df['C'] = df['B']/df['A']
df['C'] = df['C'].ffill()
nan= df[pd.isnull(df).any(1) == True]
nan['B'] = (nan['C'])*(nan['A'])
bla = pd.merge(df, nan, how = 'left', left_index=True, right_index=True)
bla['B_x'][bla['B_x'].isnull()] = bla['B_y'][bla['B_x'].isnull()]
This solved my problem as A could never be 0 / NaN. 这解决了我的问题,因为A永远不可能为0 / NaN。 I think Kracit's answer would be helpful when A can be 0/NaN.
我认为,当A可以为0 / NaN时,Kracit的答案会有所帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.