简体   繁体   中英

How to Match multiple columns with given single column and get its name in new column?

I want to match certain criteria across multiple columns. If Criteria matches return the column name:

my demo df is:

df = pd.DataFrame({"mtc": ["A", "B", "C", "D"],
                     "C1": ["A", "A", "A", "C"],
                    "C2": ["X", "B", "A", "C"],
                    "C3": ["Y", "D", "A", "D"],
                    "C4": ["Z", "D", "C", "C"]})


    mtc C1  C2  C3  C4
0   A   A   X   Y   Z
1   B   A   B   D   D
2   C   A   A   A   C
3   D   C   C   D   C

here i want to match values from mtc column to columns['C1', 'C2', 'C3', 'C4'] .

My expected output in Result Column as:

   mtc  Result  C1  C2  C3  C4
0   A     C1    A   X   Y   Z
1   B     C2    A   B   D   D
2   C     C4    A   A   A   C
3   D     C3    C   C   D   C

Solution

m = df.filter(like='C').eq(df['mtc'], axis=0)
df['Result'] = m.idxmax(1).mask(~m.any(1))

Explanations

Filter the C like columns then compare theses columns with the mtc column along axis=0 to create a boolean mask.

>>> m
      C1     C2     C3     C4
0   True  False  False  False
1  False   True  False  False
2  False  False  False   True
3  False  False   True  False

Now we can use idxmax along axis=1 to get the name of column containing the first True value in the above boolean mask. Further we could also mask the column name in case there is no match found.

>>> m.idxmax(1)

0    C1
1    C2
2    C4
3    C3
dtype: object

>>> df

  mtc C1 C2 C3 C4 Result
0   A  A  X  Y  Z     C1
1   B  A  B  D  D     C2
2   C  A  A  A  C     C4
3   D  C  C  D  C     C3
df = pd.DataFrame({"mtc": ["A", "B", "C", "D"],
                     "C1": ["A", "A", "A", "C"],
                    "C2": ["X", "B", "A", "C"],
                    "C3": ["Y", "D", "A", "D"],
                    "C4": ["Z", "D", "C", "C"]})
import numpy as np
df['result'] = np.NaN
def find_col(x):
    for col in x.index[1:-1]:
        if x['mtc'] == x[col]:
            return col
df['result']  = df.apply(lambda x: find_col(x), axis=1)

This will give output -

mtc C1 C2 C3 C4 result
A A X Y Z C1
B A B D D C2
C A A A C C4
D C C D C C3

You can melt() on mtc and find the first match of mtc == value :

m = df.melt('mtc')
m = m[m.mtc == m.value].drop_duplicates(subset='mtc', keep='first')

#    mtc variable value
# 0    A       C1     A
# 5    B       C2     B
# 11   D       C3     D
# 14   C       C4     C

Then map() back to df :

df['Result'] = df.mtc.map(m.set_index('mtc').variable)

#   mtc C1 C2 C3 C4 Result
# 0   A  A  X  Y  Z     C1
# 1   B  A  B  D  D     C2
# 2   C  A  A  A  C     C4
# 3   D  C  C  D  C     C3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM