简体繁体中英

How to use the replay buffer in tf_agents for contextual bandit, that predicts and trains on a daily basis

原文 2022-04-27 17:05:37 9 1 python/ tensorflow/ machine-learning/ bandit/ tf-agent

I am using the tf_Agents library for contextual bandits usecase.

In this usecase predictions (daily range between 20k and 30k predictions, 1 for each user) are made daily (multiple times a day) and training only happens on all the predicted data from 4 days ago (Since the labels for predictions takes 3 days to observe).

The driver seems to replay only the batch_size number of experience (Since max_step length is 1 for contextual bandits). Also the replay buffer has the same constraint only handling batch size number of experiences.

I wanted to use checkpointer and save all the predictions (experience from driver which are saved in replay buffer) from the past 4 days and train only on the first of the 4 days saved on each given day.

I am unsure how to do the following and any help is greatly appreciate.

How to (run the driver) save replay buffer using checkpoints for the entire day (a day contains, say, 3 predictions runs and each prediction will be made on 30,000 observations [say batch size of 16]). So in this case I need multiple saves for each day
How to save the replay buffers for past 4 days (12 prediction runs ) and only retrieve the first 3 prediction runs (replay buffer and the driver run) to train for each day.
Unsure how to handle the driver, replay buffer and checkpointer configurations given the above #1, #2 above

1 answers

On the Replay Buffer I don't think there is any way to get that working without implementing your own RB class (which I wouldn't necessarily recommend). Seems to me like the most straight forward solution for this is to take the memory inefficiency hit and have two RB with a different size of max_length . One of the two is the one given to the driver to store episodes and then rb.as_dataset(single_determinsitic_pass=true) is used to get the appropriate items to place in the memory of the second one used for training. The only thing you need to checkpoint of course is the first one.

Note: I'm not sure off-the-top-of-my head how exactly single_determinsitic_pass works, you may want to check that in order to determine which portion of the returned dataset corresponds to the day you want to train from. I also have the suspicion that probably the portion corresponding to the last day shifts, because if I don't remember wrong the RB table that stores the experiences works with a cursor that once reached the maximum length starts overwriting from the beginning.

Neither RB needs to know about the logic of how many prediction runs there are, in the end your code should manage that logic and you might want to keep track (maybe in a pickle if you want to save this) how many predictions correspond to each day so that you know which ones to pick.

tf_agents dqn fails to initialize

TF-agents - Replay buffer add trajectory to batch shape mismatch

tf_agents doesn't properly learn a simple environment

Error while importing tf_agents in google colab

How to build a resnet with Keras that trains and predicts the subclass from the main class?

A Keras model trains well, but predicts the same value

Can contextual bandit rewards be changed over time?

(vowpal wabbit) contextual bandit dealing with new context

Vowpal Wabbit Contextual Bandit correct usage

How to install tf-agents in anaconda

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question tf_agents dqn fails to initialize TF-agents - Replay buffer add trajectory to batch shape mismatch tf_agents doesn't properly learn a simple environment Error while importing tf_agents in google colab How to build a resnet with Keras that trains and predicts the subclass from the main class? A Keras model trains well, but predicts the same value Can contextual bandit rewards be changed over time? (vowpal wabbit) contextual bandit dealing with new context Vowpal Wabbit Contextual Bandit correct usage How to install tf-agents in anaconda

Related Tags

How to use the replay buffer in tf_agents for contextual bandit, that predicts and trains on a daily basis

Question

1 answers

solution1 1 2022-04-28 20:04:57

solution1
1 2022-04-28 20:04:57