简体   繁体   English

sparksql没有读取多分隔符csv文件

[英]sparksql not reading multidelimter csv file

I'm trying to read multidelimter (|,||) csv file by using pyspark sql, am not able read any data from dataframe its giving 0 records I'm trying to read multidelimter (|,||) csv file by using pyspark sql, am not able read any data from dataframe its giving 0 records

sample data of csv file csv 文件的样本数据

Newyork|234567|company Ltd||PIN

df = spark.read.option.("sep","|").option("header","true").load(csv)

I need to read the data, is there any other way to handle this?我需要读取数据,还有其他方法可以处理吗?

Try this-尝试这个-

spark.read
      .option("sep", "|")
      .option("header", "true")
      .csv(spark.read.text("<path>").as(Encoders.STRING).map(_.replaceAll("\\|\\|", "|")))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM