[英]Dealing with null values when pd.merge
I need to merge two dfs which have a lot of missing values (np.nan, None and (null) ). 我需要合并两个有很多缺失值的dfs(np.nan,None和(null))。
t1= pd.DataFrame(np.array([[1,2,3],[4,5,99]]),columns=['a','b','c'])
t2= pd.DataFrame(np.array([[1,None,3,'hello'],[4,5,6,'moon']]),columns=['a','b','c','d'])
t = pd.merge(t1,t2,how='outer', on=["a","c"])
That is, the data frames are: 也就是说,数据框是:
t1 =
a b c
0 1 2 3
1 4 5 99
t2 =
a b c d
0 1 None 3 hello
1 4 5 6 moon
I need a result df that gives me one row per observation, without loosing any data. 我需要一个结果df,每次观察给我一行,而不丢失任何数据。
Instead, I get a new row keeping the 'None' as a value. 相反,我得到一个新行,将'None'保持为值。
In the example above, I would like 在上面的例子中,我想
t= pd.DataFrame(np.array([[1,2,3,'hello'],[4,5,99,'moon'],[4,5,6,'moon']]),columns=['a','b','c','d'])
That is, I would like: 也就是说,我想:
t =
a b c d
0 1 2 3 hello
1 4 5 99 moon
2 4 5 6 moon
For you it is a special case, but you can try: 对你来说这是一个特例,但你可以尝试:
t= pd.merge(t1, t2[['a', 'd']].dropna(), how='left', on='a').append(t2.dropna())
the merge function will use t1 for your left join and append will append the missing row from t2, and from t2 you will only join column d to it, and the dropna() will drop down your None row. 合并函数将使用t1为你的左连接,append将从t2追加缺失的行,而从t2你只会将列d连接到它,而dropna()将下拉你的无行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.