简体   繁体   中英

Read TSV file in pyspark

What is the best way to read.tsv file with header in pyspark and store it in a spark data frame.

I am trying to use "spark.read.options" and "spark.read.csv" commands however no luck.

Thanks.

Regards, Jit

Well you can directly read the tsv file without providing external schema if there is header available as:

df = spark.read.csv(path, sep=r'\t', header=True).select('col1','col2')

Since spark is lazily evaluated it'll read only selected columns. Hope it helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM