[英]Pandas merge two data frames using logical or between common columns
I have two pandas data frames A
and B
, indexed with dates: 我有两个以日期为索引的熊猫数据框
A
和B
:
>>> A
a b c
Timestamp
2018-02-19 True False False
2018-02-20 False True False
2018-02-21 False False True
and 和
>>> B
a b d
Timestamp
2018-02-19 False True True
2018-02-20 False False False
2018-02-21 True True True
I want to merge these two data frames such that the merged data frame is a logical or
between each common entry (index, column), and also including the columns that are unique to each data frame. 我想合并这两个数据帧,以便合并的数据帧是逻辑的
or
在每个公共条目(索引,列)之间,并且还包括每个数据帧唯一的列。 In this case, the output would be: 在这种情况下,输出为:
>>> C
a b c d
Timestamp
2018-02-19 True True False True
2018-02-20 False True False False
2018-02-21 True True True True
Is there a way to do this in pandas? 有没有办法在熊猫中做到这一点?
There's probably a more elegant and generalizable solution, but this will work for the simple example you've given. 可能有一个更优雅,更通用的解决方案,但这将适用于您给出的简单示例。
A = pd.DataFrame({"a":[True, False, False],
'b':[False, True, False],
'c': [False, False, True]},
index=['a','b','c'])
B = pd.DataFrame({"a":[False, False, True],
'b':[True, False, True],
'd': [True, False, True]},
index=['a','b','c'])
C = pd.concat([(A | B)[['a', 'b']], A['c'], B['d']], axis=1)
print C
a b c d
a True True False True
b False True False False
c True True True True
This ORs the two frames, which will produce the correct result for the columns in common (a, b), but Nan for columns c, d. 对两个帧进行“或”运算,这将为公共(a,b)列生成正确的结果,而对于c,d列生成Nan。 So, we just slice off columns a and b, then concatenate with c and d, since they remain unchanged by the OR operation.
因此,我们仅将a和b列切开,然后将c和d连接起来,因为它们在OR操作中保持不变。
EDIT: Per your comment, here is more generalized solution, which will save you from having to know and/or hardcode the specific column names. 编辑:根据您的评论,这是一种更通用的解决方案,它将使您不必知道和/或硬编码特定的列名。
# Get all column names
all_columns = A.columns | B.columns
# Get column names in common
union = A.columns & B.columns
# Get disjoint column names
not_B = list(set(all_columns) - set(B.columns))
not_A = list(set(all_columns) - set(A.columns))
# Logical-or common columns, and concatenate disjoint columns
C = pd.concat([A[union] | B[union], A[not_B], B[not_A]], axis=1)
# If columns names get disordered because of set operations, use
# `all_columns` to reorder
print(C[all_columns])
a b c d
a True True False True
b False True False False
c True True True True
EDIT 2: Per kmundnic 's final solution, here is an updated version that works on more that two data frames. 编辑2: Per kmundnic的最终解决方案,这是一个更新的版本,可处理两个以上的数据帧。
# For Python 3
from functools import reduce
# A third data frame
C = pd.DataFrame({'a':[False, False, False],
'b':[True, True, False],
'e': [True, True, True]},
index=['a','b','c'])
def logical_merge(A, B):
# Get all column names
all_columns = A.columns | B.columns
# Get column names in common
common = A.columns & B.columns
# Get disjoint column names
_A = [x for x in B.columns if not x in common]
_B = [x for x in A.columns if not x in common]
# Logical-or common columns, and concatenate disjoint columns
return pd.concat([(A | B)[common], A[_B], B[_A]], axis=1)[all_columns]
frames = [A, B, C]
print(reduce(logical_merge, frames))
a b c d e
a True True False True True
b False True False False True
c True True True True True
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.