As shown in picture above, how can I find the total number of the item that appeared in both 'Actual' and 'prediction' columns for every userId? The type is pandas.core.frame.DataFrame.
The code to construct the example table as following:
import pandas as pd
import numpy as np
# initialize list of lists
data = pd.DataFrame(np.array([[32, 256, 5, 102, 74, 171, 270, 111, 209, 24],
[1, 258, 257, 281, 10, 269, 14, 13, 272, 273],
[258, 260, 264, 11, 271, 288, 294, 300, 301],
[9, 10, 11, 12, 22, 28],
[1, 514, 2, 516, 4, 13, 526, 527, 1037, 529, 256, 678],
[1, 1028, 7, 9, 1033, 15, 1047, 25, 546, 1061],
[258, 259, 514, 261, 131, 135, 520, 265, 1028, 50],
[2, 11, 12, 526, 1044, 22, 23, 27, 541, 54, 88],
[332, 168, 79, 343, 38, 1007, 9, 232, 381, 1079],
[38, 168, 561, 542, 69, 20, 79, 385, 332, 480]]))
test_actual = data.rename(columns={0: "Actual"})
test_actual['userId'] = [1,2,3,5,6,8,10,12,15,18]
test_actual = test_actual.set_index('userId')
data2 = [[154, 248, 237, 223, 83, 283, 69, 32, 480, 325],
[332, 168, 38, 9, 385, 258, 561, 41, 79, 542],
[322, 258, 226, 232, 1007, 343, 332, 260, 561, 381],
[237, 154, 196, 223, 523, 277, 226, 748, 323, 28],
[168, 332, 38, 9, 83, 561, 232, 526, 1007, 20],
[79, 38, 480, 168, 232, 561, 653, 9, 542, 996],
[9, 232, 332, 523, 168, 322, 7, 1028, 41, 542],
[83, 168, 232, 322, 385, 223, 154, 941, 283, 12],
[69, 38, 196, 480, 83, 385, 20, 343, 283, 542],
[480, 38, 69, 83, 385, 154, 542, 941, 283, 223]]
test_actual['Predict'] = data2
test_actual
Your opinion and help will be much much appreciated! Thank you!
Without further details, eg, how many classes, how long the dataset, apply
seems to be the only viable choice:
(test_actual
.apply(lambda x: set(x['Actual']).intersection(set(x['Predict'])),
axis=1)
)
Output:
userId
1 {32}
2 {258}
3 {258, 260}
5 {28}
6 {526}
8 {9}
10 {1028}
12 {12}
15 {38, 343}
18 {480, 385, 69, 38, 542}
dtype: object
IIUC, You can use numpy intersect1d,
test_actual.apply(lambda x: len(np.intersect1d(x['Actual'],x['Predict'])), axis = 1)
userId
1 1
2 1
3 2
5 1
6 1
8 1
10 1
12 1
15 2
18 5
If you are interested in values and not the count, use
test_actual.apply(lambda x: np.intersect1d(x['Actual'],x['Predict']), axis = 1)
userId
1 [32]
2 [258]
3 [258, 260]
5 [28]
6 [526]
8 [9]
10 [1028]
12 [12]
15 [38, 343]
18 [38, 69, 385, 480, 542]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.