Efficiently compute all combinations (row-wise) of two matrices as input for a pretrained binary classifier in TensorFlow?

Question

I am running an evaluation for my research in the field of recommender systems in TensorFlow. I have pretrained a binary classifier that outputs a value within [0, 1]. It requires a user and an item representation that are further transformed into embeddings within the model.

I have 100´000 users each represented by 3´283 features, thus the user matrix has dimensions 100000x3283. There are ~1.7M items which are represented also nearly the same amount (3277) of features. The features are further split into continous and categorical parst. Thus, there are 4 placeholders in the model that expect values with the feature quantity in parentheses:

user continuous (12)
user categorical (3271)
item continuous (6)
item categorical (3271)

I need to compute the respective value for each of the >10^11 combinations and collect the indices and output values of the best 1000 items for each user rgd. the highest network output.

What would be the most efficient way of doing that?

Coming across the tf.nn.top_k function I am still unsure about preloading data (partially) into a Tensor Variable (since Constants use too much memory due to internal copies made by TF), or feeding through a numpy array, and how to do the combinations inside. I played around with loops and np.repeat around the Session Runs, but this is too memory-intense and too time-inefficient. So the code below was a first try - knowing it is inefficient, but at least get something to work:

sess = tf.Session()
saver.restore(sess, tf.train.latest_checkpoint('logging_embed/'))
results = np.zeros(len(items))
start = time.time()
results = sess.run(out_layer, feed_dict={user_cont: np.repeat(np.atleast_2d(profiles[user, :12]), len(items), axis=0),
                                         user_cat: np.repeat(np.atleast_2d(profiles[user, 12:]), len(items), axis=0),
                                         item_cont: np.atleast_2d(items[:, :6]),
                                         item_cat: np.atleast_2d(items[:, 6:])})
print(str(time.time() - start))

I am running experiments on 2 Tesla K80 with a total GPU-RAM of about 23GB, but already noted that the Variable representation using tf.float32 is 3-4 times as big as it should be, so chunking of items would be necessary, eg 200k, or 300k.

Thanks in advance for any helpful suggestions!

Answer 1

I don't know if this is the most efficient way, but I would approach the problem as follows. Make your model take n user and m item profiles as input and output an nxm matrix of outputs - one for each user/item pair in the input. Once you have this model, you can experiment with the values of n and m that work well on your hardware. Then, it is just a matter of invoking this model with the right chunks of user/items and updating the current best items for each user after each model invocation.

The maintenance should be manageable and can probably be done on CPU. 1000 items for 100k users -> 4*1000*100k = 400MB matrix. If you keep the item scores for each user sorted, updates can be fairly cheap. Also, you will be dealing only with n users at a time.

Efficiently compute all combinations (row-wise) of two matrices as input for a pretrained binary classifier in TensorFlow?

Question

1 answers

solution1
0 2017-08-23 05:06:42

Efficiently compute all combinations (row-wise) of two matrices as input for a pretrained binary classifier in TensorFlow?

Question

1 answers

solution1 0 2017-08-23 05:06:42

solution1
0 2017-08-23 05:06:42