简体   繁体   English

如何保存已经在Google Colab笔记本中加载和处理的数据,而不必每次都重新加载?

[英]How to save your data you've already loaded and processed in Google Colab notebook so you don't have to reload it everytime?

我已经从pickle库中阅读了有关“棘手”的信息,但是这是否仅保存您训练过的模型,而不是保存例如从庞大的csv文件加载到变量中的实际数据框?

This example notebook has some examples of different ways to save and load data. 该示例笔记本中有一些保存和加载数据的不同方法的示例。

You can actually use pickle to save any Python object, including Pandas dataframes, however it's more usual to serialize using one of Pandas' methods pandas.DataFrame.to_csv , to_feather etc. 实际上,您可以使用pickle保存任何Python对象,包括Pandas数据帧,但是使用Pandas的方法之一pandas.DataFrame.to_csvto_feather等进行序列化更为常见。

I would probably recommend the option which uses the GCS command-line-tool which you can run from inside your notebook by prefixing with ! 我可能会推荐使用GCS命令行工具的选项,您可以在笔记本内部通过添加!作为前缀来运行它!

import pandas as pd
# Create a local file to upload.
df = pd.DataFrame([1,2,3])
df.to_csv("/tmp/to_upload.txt")

# Copy the file to our new bucket.
# Full reference: https://cloud.google.com/storage/docs/gsutil/commands/cp
!gsutil cp /tmp/to_upload.txt gs://my-bucket/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM