For a particular task I'm working on I have a dataset that is about 25 GB. I'm still experimenting with several methods of preprocessing and definitely don't have my data to it's final form yet. I'm not sure what the common workflow is for this sort of problem, so here is what I'm thinking:
The alternative approach is this:
Which of these approaches is suggested? Which will incur fewer charges and is used most often when running these kind of operations. Is there a different workflow that I'm not seeing here?
On the billing side, the charges would be the same, as the fuse operations are charged like any other Cloud Storage interface according to the documentation . In your use case I don't know how you are going to train the data, but if you do more than one operation to files it would be better to have them downloaded, trained locally and then the final result uploaded, which would be 2 object operations. If you do, for example, more than one change or read to a file during the training, every operation would be an object operation. On the workflow side, the proposed one looks good to me.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.