[英]Spark: create a csv file(must use scala and dataframe)
I'm learning scala
and dataframe
recently and I came up a problem.我最近在学习
scala
和dataframe
,但遇到了一个问题。 It is about dataframe
things.这是关于
dataframe
事情。 It must be solved using Scala
and Dataframe
, but NOT SparkSQL
.它必须使用
Scala
和Dataframe
解决,而不是SparkSQL
。
Problem:问题:
Create a csv
file with 4 columns (person, class, subject, score)for a school and put some random data into the csv, each person must have "Maths", "English", "Art" and plus some other subjects, and at least there are 3 classes.为学校创建一个包含 4 列(人、班级、科目、分数)的
csv
文件,并将一些随机数据放入 csv,每个人必须有“数学”、“英语”、“艺术”以及其他一些科目,以及至少有3个班级。
Write a Spark program to:编写一个 Spark 程序来:
read a csv file读取一个 csv 文件
show the full data table显示完整的数据表
show how many persons per class显示每班有多少人
show the person and his score with the highest score in "Maths"显示“数学”中得分最高的人和他的分数
I have tried to solve it and googled it, but what I came up is about using SQL to resolve it and also SQL is the first solution given by google.我试图解决它并用谷歌搜索它,但我想到的是使用 SQL 来解决它,而且 SQL 是谷歌给出的第一个解决方案。
I really don't know how to do it via Spark and Dataframe but NOT SparkSQL, though the tutorial said it was a very easy question:(我真的不知道如何通过 Spark 和 Dataframe 而不是 SparkSQL 来做到这一点,尽管教程说这是一个非常简单的问题:(
Could anyone help me with it, like write an example for me or give me an example?任何人都可以帮助我,比如为我写一个例子或给我一个例子? thank you so much.
太感谢了。 I will very appreciate it.
我会很感激的。
Sample csv file :示例 csv 文件:
+-------+-------+---------+-------+
| name | class | subject | marks |
+-------+-------+---------+-------+
| ab | 12 | Maths | 72 |
+-------+-------+---------+-------+
| abc | 12 | Maths | 88 |
+-------+-------+---------+-------+
| abcd | 11 | Arts | 92 |
+-------+-------+---------+-------+
| abcde | 12 | English | 88 |
+-------+-------+---------+-------+
| bc | 11 | Maths | 99 |
+-------+-------+---------+-------+
| bcd | 12 | English | 55 |
+-------+-------+---------+-------+
| bcde | 11 | English | 77 |
+-------+-------+---------+-------+
| axax | 10 | Maths | 83 |
+-------+-------+---------+-------+
| amam | 10 | English | 65 |
+-------+-------+---------+-------+
| arar | 10 | Arts | 66 |
+-------+-------+---------+-------+
val df = spark.read.option("inferSchema","true").option("header","true").csv(filePath)
Show dataframe :显示数据框:
df.show()
Show how many persons per class :显示每班人数:
df.groupBy("class").count.show
df.filter(col("subject")==="Maths").orderBy(desc("marks")).limit(1).show
df.filter(col("subject")==="Maths").orderBy(desc("marks")).limit(1).show
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.