Spark：创建一个csv文件（必须使用scala和dataframe）

Question

I'm learning scala and dataframe recently and I came up a problem.我最近在学习scala和dataframe ，但遇到了一个问题。 It is about dataframe things.这是关于dataframe事情。 It must be solved using Scala and Dataframe , but NOT SparkSQL .它必须使用Scala和Dataframe解决，而不是SparkSQL 。

Problem:问题：

Create a csv file with 4 columns (person, class, subject, score)for a school and put some random data into the csv, each person must have "Maths", "English", "Art" and plus some other subjects, and at least there are 3 classes.为学校创建一个包含 4 列（人、班级、科目、分数）的csv文件，并将一些随机数据放入 csv，每个人必须有“数学”、“英语”、“艺术”以及其他一些科目，以及至少有3个班级。
Write a Spark program to:编写一个 Spark 程序来：
- read a csv file读取一个 csv 文件
- show the full data table显示完整的数据表
- show how many persons per class显示每班有多少人
- show the person and his score with the highest score in "Maths"显示“数学”中得分最高的人和他的分数

I have tried to solve it and googled it, but what I came up is about using SQL to resolve it and also SQL is the first solution given by google.我试图解决它并用谷歌搜索它，但我想到的是使用 SQL 来解决它，而且 SQL 是谷歌给出的第一个解决方案。

I really don't know how to do it via Spark and Dataframe but NOT SparkSQL, though the tutorial said it was a very easy question:(我真的不知道如何通过 Spark 和 Dataframe 而不是 SparkSQL 来做到这一点，尽管教程说这是一个非常简单的问题:(

Could anyone help me with it, like write an example for me or give me an example?任何人都可以帮助我，比如为我写一个例子或给我一个例子？ thank you so much.太感谢了。 I will very appreciate it.我会很感激的。

Answer 1

Sample csv file :示例 csv 文件：

+-------+-------+---------+-------+   
| name  | class | subject | marks |
+-------+-------+---------+-------+
| ab    | 12    | Maths   | 72    |
+-------+-------+---------+-------+
| abc   | 12    | Maths   | 88    |
+-------+-------+---------+-------+
| abcd  | 11    | Arts    | 92    |
+-------+-------+---------+-------+
| abcde | 12    | English | 88    |
+-------+-------+---------+-------+
| bc    | 11    | Maths   | 99    |
+-------+-------+---------+-------+
| bcd   | 12    | English | 55    |
+-------+-------+---------+-------+
| bcde  | 11    | English | 77    |
+-------+-------+---------+-------+
| axax  | 10    | Maths   | 83    |
+-------+-------+---------+-------+
| amam  | 10    | English | 65    |
+-------+-------+---------+-------+
| arar  | 10    | Arts    | 66    |
+-------+-------+---------+-------+

Read csv file:读取csv文件：

val df = spark.read.option("inferSchema","true").option("header","true").csv(filePath)

Show dataframe :显示数据框：
df.show()
Show how many persons per class :显示每班人数：
df.groupBy("class").count.show
Show the person and his score with the highest score in "Maths": df.filter(col("subject")==="Maths").orderBy(desc("marks")).limit(1).show显示“数学”中得分最高的人和他的分数： df.filter(col("subject")==="Maths").orderBy(desc("marks")).limit(1).show
Moreover for the last question we can also filter out the class also.此外，对于最后一个问题，我们也可以过滤掉类。

Spark：创建一个csv文件（必须使用scala和dataframe）

问题描述

1 个解决方案

解决方案1
0 2020-02-17 10:28:48

Spark：创建一个csv文件（必须使用scala和dataframe）

问题描述

1 个解决方案

解决方案1 0 2020-02-17 10:28:48

解决方案1
0 2020-02-17 10:28:48