I hope someone could help me please. My problem is the following:
To read a CSV file in Spark I'm using the code
val df=spark.read.option("header","true").option("inferSchema","true").csv("/home/user/Documents/filename.csv")
assuming that my file is called filename.csv
and the path is /home/user/Documents/
To show the first 10 results I use
df.show(10)
but instead I get the following result which contains the character and not showing the 10 results as desired
scala> df.show(10)
+--------+---------+---------+-----------------+
| c1| c2| c3| c4|
+--------+---------+---------+-----------------+
|��1.0|5450|3007|20160101|
+--------+---------+---------+-----------------+
The CSV file looks something like this
c1 c2 c3 c4
1 5450 3007 20160101
2 2156 1414 20160107
1 78229 3656 20160309
1 34963 4484 20160104
1 7897 3350 20160105
11 13247 3242 20160303
2 4957 3350 20160124
1 73083 4211 20160207
The file that I'm trying to read is big. When I try smaller files I don't get the strange character and I can see the first 10 results without problem.
Any help is appreciated.
Sometimes it is not the problem caused by settings of Spark. Try to re-save(save as) your CSV file as "CSV UTF-8 (comma delimited)", then rerun your code, the strange characters will gone. I had similar problem when read some CSV file containing German words, then I did above, it is all good.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.