How to get the intersection item between two dataframe columns?

Question

[Image example]

As shown in picture above, how can I find the total number of the item that appeared in both 'Actual' and 'prediction' columns for every userId? The type is pandas.core.frame.DataFrame.

The code to construct the example table as following:

import pandas as pd
import numpy as np

# initialize list of lists 
data = pd.DataFrame(np.array([[32, 256, 5, 102, 74, 171, 270, 111, 209, 24],
                [1, 258, 257, 281, 10, 269, 14, 13, 272, 273],
                [258, 260, 264, 11, 271, 288, 294, 300, 301],
                [9, 10, 11, 12, 22, 28],
                [1, 514, 2, 516, 4, 13, 526, 527, 1037, 529, 256, 678],
                [1, 1028, 7, 9, 1033, 15, 1047, 25, 546, 1061],
                [258, 259, 514, 261, 131, 135, 520, 265, 1028, 50],
                [2, 11, 12, 526, 1044, 22, 23, 27, 541, 54, 88],
                [332, 168, 79, 343, 38, 1007, 9, 232, 381, 1079],
                [38, 168, 561, 542, 69, 20, 79, 385, 332, 480]]))

test_actual = data.rename(columns={0: "Actual"})
test_actual['userId'] = [1,2,3,5,6,8,10,12,15,18]
test_actual = test_actual.set_index('userId')

data2 = [[154, 248, 237, 223, 83, 283, 69, 32, 480, 325],
         [332, 168, 38, 9, 385, 258, 561, 41, 79, 542],
         [322, 258, 226, 232, 1007, 343, 332, 260, 561, 381],
         [237, 154, 196, 223, 523, 277, 226, 748, 323, 28],
         [168, 332, 38, 9, 83, 561, 232, 526, 1007, 20],
         [79, 38, 480, 168, 232, 561, 653, 9, 542, 996],
         [9, 232, 332, 523, 168, 322, 7, 1028, 41, 542],
         [83, 168, 232, 322, 385, 223, 154, 941, 283, 12], 
         [69, 38, 196, 480, 83, 385, 20, 343, 283, 542], 
         [480, 38, 69, 83, 385, 154, 542, 941, 283, 223]]

test_actual['Predict'] = data2
test_actual

Your opinion and help will be much much appreciated! Thank you!

Answer 1

Without further details, eg, how many classes, how long the dataset, apply seems to be the only viable choice:

(test_actual
   .apply(lambda x: set(x['Actual']).intersection(set(x['Predict'])),
                               axis=1)
)

Output:

userId
1                        {32}
2                       {258}
3                  {258, 260}
5                        {28}
6                       {526}
8                         {9}
10                     {1028}
12                       {12}
15                  {38, 343}
18    {480, 385, 69, 38, 542}
dtype: object

Answer 2

IIUC, You can use numpy intersect1d,

test_actual.apply(lambda x: len(np.intersect1d(x['Actual'],x['Predict'])), axis = 1)

userId
1     1
2     1
3     2
5     1
6     1
8     1
10    1
12    1
15    2
18    5

If you are interested in values and not the count, use

test_actual.apply(lambda x: np.intersect1d(x['Actual'],x['Predict']), axis = 1)

userId
1                        [32]
2                       [258]
3                  [258, 260]
5                        [28]
6                       [526]
8                         [9]
10                     [1028]
12                       [12]
15                  [38, 343]
18    [38, 69, 385, 480, 542]

How to get the intersection item between two dataframe columns?

Question

2 answers

solution1
1 2020-03-05 19:07:12

solution2
1 ACCPTED 2020-03-05 19:07:32

How to get the intersection item between two dataframe columns?

Question

2 answers

solution1 1 2020-03-05 19:07:12

solution2 1 ACCPTED 2020-03-05 19:07:32

solution1
1 2020-03-05 19:07:12

solution2
1 ACCPTED 2020-03-05 19:07:32