[英]A file format writable by python, readable as a Dataframe in Spark
I have python scripts (no Spark here) producing some data files, that I want to be readable easily as Dataframes in a scala/spark application. 我有python脚本(这里没有Spark)生成一些数据文件,我想像scala / spark应用程序中的Dataframes一样容易阅读。
What's the best choice ? 最佳选择是什么?
If your data doesn't have newlines in then a simple text-based format such as TSV is probably best. 如果您的数据中没有换行符,那么最好使用诸如TSV之类的基于文本的简单格式。
If you need to include binary data then a separated format like protobuf makes sense - anything for which a hadoop InputFormat exists should be fine. 如果您需要包括二进制数据,那么像protobuf这样的单独格式就很有意义-存在hadoop InputFormat的任何内容都可以。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.