简体   繁体   中英

A file format writable by python, readable as a Dataframe in Spark

I have python scripts (no Spark here) producing some data files, that I want to be readable easily as Dataframes in a scala/spark application.

What's the best choice ?

If your data doesn't have newlines in then a simple text-based format such as TSV is probably best.

If you need to include binary data then a separated format like protobuf makes sense - anything for which a hadoop InputFormat exists should be fine.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM