简体   繁体   中英

Spark - Strange characters when reading CSV file

I hope someone could help me please. My problem is the following:

To read a CSV file in Spark I'm using the code

val df=spark.read.option("header","true").option("inferSchema","true").csv("/home/user/Documents/filename.csv")

assuming that my file is called filename.csv and the path is /home/user/Documents/

To show the first 10 results I use

df.show(10)

but instead I get the following result which contains the character and not showing the 10 results as desired

scala> df.show(10)
+--------+---------+---------+-----------------+                                
|     c1|      c2|      c3|              c4|
+--------+---------+---------+-----------------+
|��1.0|5450|3007|20160101|
+--------+---------+---------+-----------------+

The CSV file looks something like this

c1  c2      c3     c4

1   5450    3007    20160101

2   2156    1414    20160107

1   78229   3656    20160309

1   34963   4484    20160104

1   7897    3350    20160105

11  13247   3242    20160303

2   4957    3350    20160124

1   73083   4211    20160207

The file that I'm trying to read is big. When I try smaller files I don't get the strange character and I can see the first 10 results without problem.

Any help is appreciated.

Sometimes it is not the problem caused by settings of Spark. Try to re-save(save as) your CSV file as "CSV UTF-8 (comma delimited)", then rerun your code, the strange characters will gone. I had similar problem when read some CSV file containing German words, then I did above, it is all good.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM