简体   繁体   English

通过从右表中采样为左联接填写NaN值

[英]Fill in NaN values for left join by sampling from right table

I cannot figure out a nice panda-ish way to fill in missing NaN values for left join by sampling from right table. 我无法找出一种很好的熊猫式方法,可以通过从右表中进行采样来为左连接填充缺失的NaN值。

eg joined_left = left.merge(right, how="left", left_on=[attr1], right_on=[attr2]) from left and right 例如:joind_left = left.merge(right,how =“ left”,left_on = [attr1],right_on = [attr2])

   0  1  2
0  1  1  1
1  2  2  2
2  3  3  3
3  9  9  9
4  1  3  2

   0  1  2
0  1  2  2
1  1  2  3
2  3  2  2
3  3  2  9
4  3  2  2

produces smth like 产生像

   0  1_x  2_x  1_y  2_y
0  1    1    1  2.0  2.0
1  1    1    1  2.0  3.0
2  2    2    2  NaN  NaN
3  3    3    3  2.0  2.0
4  3    3    3  2.0  9.0
5  3    3    3  2.0  2.0
6  9    9    9  NaN  NaN
7  1    3    2  2.0  2.0
8  1    3    2  2.0  3.0

How do I sample a row from a right table instead of filling NaNs? 如何从右表中采样一行而不是填充NaN?

This is what I tried so far playground : 这是我到目前为止尝试过的操场

left = [[1,1,1], [2,2,2],[3,3,3], [9,9,9], [1,3,2]]
right = [[1,2,2],[1,2,3],[3,2,2], [3,2,9], [3,2,2]]
left = np.asarray(left)
right = np.asarray(right)
left = pd.DataFrame(left)
right = pd.DataFrame(right)
joined_left = left.merge(right, how="left", left_on=[0], right_on=[0])

while(joined_left.isnull().values.any()):
    right_sample = right.sample().drop(0, axis=1)
    joined_left.fillna(value=right_sample, limit=1)

print joined_left

Basically sample randomly and use fillna() for first occurance of NaN value to fill in...but for some reason I get no output. 基本上是随机采样,对于第一次出现的NaN值使用fillna()进行填充...但是由于某种原因,我没有输出。

Thank you! 谢谢!

One of outputs could be 输出之一可能是

   0  1_x  2_x  1_y  2_y
0  1    1    1  2.0  2.0
1  1    1    1  2.0  3.0
2  2    2    2  2.0  2.0
3  3    3    3  2.0  2.0
4  3    3    3  2.0  9.0
5  3    3    3  2.0  2.0
6  9    9    9  3.0  2.9
7  1    3    2  2.0  2.0
8  1    3    2  2.0  3.0

with sampled 3 2 2 and 3 2 9 采样3 2 23 2 9

Using sample with fillna samplefillna一起fillna

joined_left = left.merge(right, how="left", left_on=[0], right_on=[0],indicator=True) # adding indicator
joined_left
Out[705]: 
   0  1_x  2_x  1_y  2_y     _merge
0  1    1    1  2.0  2.0       both
1  1    1    1  2.0  3.0       both
2  2    2    2  NaN  NaN  left_only
3  3    3    3  2.0  2.0       both
4  3    3    3  2.0  9.0       both
5  3    3    3  2.0  2.0       both
6  9    9    9  NaN  NaN  left_only
7  1    3    2  2.0  2.0       both
8  1    3    2  2.0  3.0       both
nnull=joined_left['_merge'].eq('left_only').sum() # find all many row miss match , at the mergedf
s=right.sample(nnull)# rasmple from the dataframe after dropna 
s.index=joined_left.index[joined_left['_merge'].eq('left_only')] # reset the index of the subset fill df to the index of null value show up 
joined_left.fillna(s.rename(columns={1:'1_y',2:'2_y'})) 
Out[706]: 
   0  1_x  2_x  1_y  2_y     _merge
0  1    1    1  2.0  2.0       both
1  1    1    1  2.0  3.0       both
2  2    2    2  2.0  2.0  left_only
3  3    3    3  2.0  2.0       both
4  3    3    3  2.0  9.0       both
5  3    3    3  2.0  2.0       both
6  9    9    9  2.0  3.0  left_only
7  1    3    2  2.0  2.0       both
8  1    3    2  2.0  3.0       both

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从连接中填充数据帧nan值 - Fill dataframe nan values from a join 如何在 dataframe 中从右到左填充 0 值? - How to fill 0 values in dataframe from right to left? Pandas:在 Left Outer Join 之后用另一个 dateframe 中的值填充 NaN 值并且没有公共行 - Pandas: Fill NaN values after Left Outer Join with values in another dateframe and with no common rows 外连接将右边的缺失值添加为零或NaN - Outer Join adding Missing Values from Right as Zeroes or NaN Pandas 左连接是右表上的 NA 值作为通配符 - Pandas left join were NA values on right table are taken as a wildcard 左加入(flask)sqlalchemy,获取不匹配的值并在右表上进行过滤 - Left join in (flask)sqlalchemy with getting unmatched values and filter on the right table 在Pandas DataFrame中,如何合并/合并两个具有左表中所有行的DataFrame,并从右DataFrame中重复值 - In Pandas DataFrame how to merge/join two DataFrame that has all row from left table and repeat values from right DataFrame 从右列删除NaN值,同时保留左列中的值 - Remove NaN Values From Right Column While Retaining Values In Left Columns 根据其左侧单元格的值在 Pandas DataFrame 中填充 NaN 值 - Fill NaN values in a pandas DataFrame depending on values of cells to its left 左合并两个数据框并仅填充 Pandas 中的 NaN 值 - Left merge two dataframes and fill only NaN values in Pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM