简体   繁体   English

如何为熊猫指定fillna方法

[英]How to specify the method for fillna in pandas

Suppose I have a dataframe like: 假设我有一个像这样的数据框:

 1. A   B
 2. a1  b1
 3. a2  NaN
 4. a3  NaN

How do I fill NaNs with say (b1/a1)*a2 and (b1/a1)*a3 如何用(b1/a1)*a2(b1/a1)*a3填充NaN

I guess something like df.apply(pd.Series.my_function) has to be used. 我猜想像df.apply(pd.Series.my_function)类的东西必须使用。 Could someone help me out with this? 有人可以帮我这个忙吗?

Edit: My representation must have been misleading. 编辑:我的陈述一定是令人误解的。 The NaNs can come anywhere in the dataframe and I want to fill NaNs with the (closest non-NaN B/closest non-NaN A)*a2 . NaN可以在数据帧中的任何位置出现,我想用(最接近的非NaN B /最接近的非NaN A)* a2来填充NaN。

If I understand correctly, you're looking for something like: 如果我理解正确,那么您正在寻找类似的东西:

>>> df = pd.DataFrame([[i, 20+i] for i in range(10)], columns=['a','b'])
>>> df['b'][[3,4,5,8]] = np.nan
>>> print df
   a   b
0  0  20
1  1  21
2  2  22
3  3 NaN
4  4 NaN
5  5 NaN
6  6  26
7  7  27
8  8 NaN
9  9  29
>>> nan_indices = df.index[np.logical_not(df['b'] < np.inf)].values
>>> for nan_index in nan_indices:
...     last_non_nan_before_nan = np.where(df['b'][:nan_index].values < np.inf)[0][-1]
...     first_non_nan_after_nan = nan_index + 1 + (np.where(df['b'][nan_index+1:] < np.inf))[0][0]
...     if nan_index - last_non_nan_before_nan >= first_non_nan_after_nan - nan_index:
...         index_of_closest_non_nan_value = first_non_nan_after_nan
...     else:
...         index_of_closest_non_nan_value = last_non_nan_before_nan
...     df['b'][nan_index] = df['b'][index_of_closest_non_nan_value] / \
...                          df['a'][index_of_closest_non_nan_value] * \
...                          df['a'][nan_index]
...
>>> print df
   a          b
0  0  20.000000
1  1  21.000000
2  2  22.000000
3  3  33.000000
4  4  44.000000
5  5  21.666667
6  6  26.000000
7  7  27.000000
8  8  25.777778
9  9  29.000000

You'll have to make up for edge cases (if b is equal to 0), if there is a NaN at the beginning or the end of the DataFrame, etc.). 您必须弥补边缘情况(如果b等于0),DataFrame的开头或结尾处是否存在NaN等)。

df = pd.DataFrame(np.reshape(np.arange(10), (5,2)), columns = ['A', 'B'])
df.iloc[2,1] = np.nan
df.iloc[3,1] = np.nan
df['C'] = df['B']/df['A']
df['C'] = df['C'].ffill()
nan= df[pd.isnull(df).any(1) == True]
nan['B'] = (nan['C'])*(nan['A'])
bla = pd.merge(df, nan, how = 'left', left_index=True, right_index=True)
bla['B_x'][bla['B_x'].isnull()] = bla['B_y'][bla['B_x'].isnull()]

This solved my problem as A could never be 0 / NaN. 这解决了我的问题,因为A永远不可能为0 / NaN。 I think Kracit's answer would be helpful when A can be 0/NaN. 我认为,当A可以为0 / NaN时,Kracit的答案会有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM