简体   繁体   English

熊猫使用逻辑列或公共列之间的两个数据帧合并

[英]Pandas merge two data frames using logical or between common columns

I have two pandas data frames A and B , indexed with dates: 我有两个以日期为索引的熊猫数据框AB

>>> A
                a      b      c
Timestamp
2018-02-19   True  False  False
2018-02-20  False   True  False
2018-02-21  False  False   True

and

>>> B
                a      b      d
Timestamp
2018-02-19  False   True   True
2018-02-20  False  False  False
2018-02-21   True   True   True

I want to merge these two data frames such that the merged data frame is a logical or between each common entry (index, column), and also including the columns that are unique to each data frame. 我想合并这两个数据帧,以便合并的数据帧是逻辑的or在每个公共条目(索引,列)之间,并且还包括每个数据帧唯一的列。 In this case, the output would be: 在这种情况下,输出为:

>>> C
                a      b      c      d
Timestamp
2018-02-19   True   True  False   True
2018-02-20  False   True  False  False
2018-02-21   True   True   True   True

Is there a way to do this in pandas? 有没有办法在熊猫中做到这一点?

There's probably a more elegant and generalizable solution, but this will work for the simple example you've given. 可能有一个更优雅,更通用的解决方案,但这将适用于您给出的简单示例。

A = pd.DataFrame({"a":[True, False, False],
                  'b':[False, True, False],
                  'c': [False, False, True]},
                  index=['a','b','c'])

B = pd.DataFrame({"a":[False, False, True],
                  'b':[True, False, True], 
                  'd': [True, False, True]}, 
                  index=['a','b','c'])

C = pd.concat([(A | B)[['a', 'b']], A['c'], B['d']], axis=1)

print C

       a     b      c      d
a   True  True  False   True
b  False  True  False  False
c   True  True   True   True

This ORs the two frames, which will produce the correct result for the columns in common (a, b), but Nan for columns c, d. 对两个帧进行“或”运算,这将为公共(a,b)列生成正确的结果,而对于c,d列生成Nan。 So, we just slice off columns a and b, then concatenate with c and d, since they remain unchanged by the OR operation. 因此,我们仅将a和b列切开,然后将c和d连接起来,因为它们在OR操作中保持不变。

EDIT: Per your comment, here is more generalized solution, which will save you from having to know and/or hardcode the specific column names. 编辑:根据您的评论,这是一种更通用的解决方案,它将使您不必知道和/或硬编码特定的列名。

# Get all column names
all_columns = A.columns | B.columns

# Get column names in common
union = A.columns & B.columns

# Get disjoint column names
not_B = list(set(all_columns) - set(B.columns))
not_A = list(set(all_columns) - set(A.columns))

# Logical-or common columns, and concatenate disjoint columns
C = pd.concat([A[union] | B[union], A[not_B], B[not_A]], axis=1)

# If columns names get disordered because of set operations, use
# `all_columns` to reorder

print(C[all_columns])

       a     b      c      d
a   True  True  False   True
b  False  True  False  False
c   True  True   True   True

EDIT 2: Per kmundnic 's final solution, here is an updated version that works on more that two data frames. 编辑2: Per kmundnic的最终解决方案,这是一个更新的版本,可处理两个以上的数据帧。

# For Python 3
from functools import reduce

# A third data frame
C = pd.DataFrame({'a':[False, False, False],
                  'b':[True, True, False], 
                  'e': [True, True, True]}, 
                  index=['a','b','c'])

def logical_merge(A, B):

    # Get all column names
    all_columns = A.columns | B.columns

    # Get column names in common
    common = A.columns & B.columns

    # Get disjoint column names
    _A = [x for x in B.columns if not x in common]
    _B = [x for x in A.columns if not x in common]

    # Logical-or common columns, and concatenate disjoint columns
    return pd.concat([(A | B)[common], A[_B], B[_A]], axis=1)[all_columns]

frames = [A, B, C]

print(reduce(logical_merge, frames))

       a     b      c      d     e
a   True  True  False   True  True
b  False  True  False  False  True
c   True  True   True   True  True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM