简体   繁体   中英

Pandas merge two data frames using logical or between common columns

I have two pandas data frames A and B , indexed with dates:

>>> A
                a      b      c
Timestamp
2018-02-19   True  False  False
2018-02-20  False   True  False
2018-02-21  False  False   True

and

>>> B
                a      b      d
Timestamp
2018-02-19  False   True   True
2018-02-20  False  False  False
2018-02-21   True   True   True

I want to merge these two data frames such that the merged data frame is a logical or between each common entry (index, column), and also including the columns that are unique to each data frame. In this case, the output would be:

>>> C
                a      b      c      d
Timestamp
2018-02-19   True   True  False   True
2018-02-20  False   True  False  False
2018-02-21   True   True   True   True

Is there a way to do this in pandas?

There's probably a more elegant and generalizable solution, but this will work for the simple example you've given.

A = pd.DataFrame({"a":[True, False, False],
                  'b':[False, True, False],
                  'c': [False, False, True]},
                  index=['a','b','c'])

B = pd.DataFrame({"a":[False, False, True],
                  'b':[True, False, True], 
                  'd': [True, False, True]}, 
                  index=['a','b','c'])

C = pd.concat([(A | B)[['a', 'b']], A['c'], B['d']], axis=1)

print C

       a     b      c      d
a   True  True  False   True
b  False  True  False  False
c   True  True   True   True

This ORs the two frames, which will produce the correct result for the columns in common (a, b), but Nan for columns c, d. So, we just slice off columns a and b, then concatenate with c and d, since they remain unchanged by the OR operation.

EDIT: Per your comment, here is more generalized solution, which will save you from having to know and/or hardcode the specific column names.

# Get all column names
all_columns = A.columns | B.columns

# Get column names in common
union = A.columns & B.columns

# Get disjoint column names
not_B = list(set(all_columns) - set(B.columns))
not_A = list(set(all_columns) - set(A.columns))

# Logical-or common columns, and concatenate disjoint columns
C = pd.concat([A[union] | B[union], A[not_B], B[not_A]], axis=1)

# If columns names get disordered because of set operations, use
# `all_columns` to reorder

print(C[all_columns])

       a     b      c      d
a   True  True  False   True
b  False  True  False  False
c   True  True   True   True

EDIT 2: Per kmundnic 's final solution, here is an updated version that works on more that two data frames.

# For Python 3
from functools import reduce

# A third data frame
C = pd.DataFrame({'a':[False, False, False],
                  'b':[True, True, False], 
                  'e': [True, True, True]}, 
                  index=['a','b','c'])

def logical_merge(A, B):

    # Get all column names
    all_columns = A.columns | B.columns

    # Get column names in common
    common = A.columns & B.columns

    # Get disjoint column names
    _A = [x for x in B.columns if not x in common]
    _B = [x for x in A.columns if not x in common]

    # Logical-or common columns, and concatenate disjoint columns
    return pd.concat([(A | B)[common], A[_B], B[_A]], axis=1)[all_columns]

frames = [A, B, C]

print(reduce(logical_merge, frames))

       a     b      c      d     e
a   True  True  False   True  True
b  False  True  False  False  True
c   True  True   True   True  True

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM