I have python scripts (no Spark here) producing some data files, that I want to be readable easily as Dataframes in a scala/spark application.
What's the best choice ?
If your data doesn't have newlines in then a simple text-based format such as TSV is probably best.
If you need to include binary data then a separated format like protobuf makes sense - anything for which a hadoop InputFormat exists should be fine.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.