通过从右表中采样为左联接填写NaN值

Question

我无法找出一种很好的熊猫式方法，可以通过从右表中进行采样来为左连接填充缺失的NaN值。

例如：joind_left = left.merge（right，how =“ left”，left_on = [attr1]，right_on = [attr2]）

产生像

   0  1_x  2_x  1_y  2_y
0  1    1    1  2.0  2.0
1  1    1    1  2.0  3.0
2  2    2    2  NaN  NaN
3  3    3    3  2.0  2.0
4  3    3    3  2.0  9.0
5  3    3    3  2.0  2.0
6  9    9    9  NaN  NaN
7  1    3    2  2.0  2.0
8  1    3    2  2.0  3.0

如何从右表中采样一行而不是填充NaN？

这是我到目前为止尝试过的操场：

left = [[1,1,1], [2,2,2],[3,3,3], [9,9,9], [1,3,2]]
right = [[1,2,2],[1,2,3],[3,2,2], [3,2,9], [3,2,2]]
left = np.asarray(left)
right = np.asarray(right)
left = pd.DataFrame(left)
right = pd.DataFrame(right)
joined_left = left.merge(right, how="left", left_on=[0], right_on=[0])

while(joined_left.isnull().values.any()):
    right_sample = right.sample().drop(0, axis=1)
    joined_left.fillna(value=right_sample, limit=1)

print joined_left

基本上是随机采样，对于第一次出现的NaN值使用fillna（）进行填充...但是由于某种原因，我没有输出。

谢谢！

输出之一可能是

   0  1_x  2_x  1_y  2_y
0  1    1    1  2.0  2.0
1  1    1    1  2.0  3.0
2  2    2    2  2.0  2.0
3  3    3    3  2.0  2.0
4  3    3    3  2.0  9.0
5  3    3    3  2.0  2.0
6  9    9    9  3.0  2.9
7  1    3    2  2.0  2.0
8  1    3    2  2.0  3.0

采样3 2 2和3 2 9

Answer 1

将sample与fillna一起fillna

joined_left = left.merge(right, how="left", left_on=[0], right_on=[0],indicator=True) # adding indicator
joined_left
Out[705]: 
   0  1_x  2_x  1_y  2_y     _merge
0  1    1    1  2.0  2.0       both
1  1    1    1  2.0  3.0       both
2  2    2    2  NaN  NaN  left_only
3  3    3    3  2.0  2.0       both
4  3    3    3  2.0  9.0       both
5  3    3    3  2.0  2.0       both
6  9    9    9  NaN  NaN  left_only
7  1    3    2  2.0  2.0       both
8  1    3    2  2.0  3.0       both
nnull=joined_left['_merge'].eq('left_only').sum() # find all many row miss match , at the mergedf
s=right.sample(nnull)# rasmple from the dataframe after dropna 
s.index=joined_left.index[joined_left['_merge'].eq('left_only')] # reset the index of the subset fill df to the index of null value show up 
joined_left.fillna(s.rename(columns={1:'1_y',2:'2_y'})) 
Out[706]: 
   0  1_x  2_x  1_y  2_y     _merge
0  1    1    1  2.0  2.0       both
1  1    1    1  2.0  3.0       both
2  2    2    2  2.0  2.0  left_only
3  3    3    3  2.0  2.0       both
4  3    3    3  2.0  9.0       both
5  3    3    3  2.0  2.0       both
6  9    9    9  2.0  3.0  left_only
7  1    3    2  2.0  2.0       both
8  1    3    2  2.0  3.0       both

通过从右表中采样为左联接填写NaN值

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-11-11 03:06:03

通过从右表中采样为左联接填写NaN值

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-11-11 03:06:03

解决方案1
1 已采纳 2018-11-11 03:06:03