Spark SQL：ORDER BY计数DESC失败了吗？

Question

There is a table with two columns books and readers of these books, where books and readers are book and reader IDs, respectively. 这里有一本包含两列books和readers的书桌，其中books和readers分别是书籍和读者ID。 When trying to order readers by number of books they read, I get AbstractSparkSQLParser exception: 当他们尝试按照他们阅读的书籍订购读者时，我得到了AbstractSparkSQLParser异常：

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.log4j.Logger
import org.apache.log4j.Level
import org.apache.spark.sql.functions._

object Small {

  case class Book(book: Int, reader: Int)

  val recs = Array(
    Book(book = 1, reader = 30),
    Book(book = 2, reader = 10),
    Book(book = 3, reader = 20),
    Book(book = 1, reader = 20),
    Book(book = 1, reader = 10),
    Book(book = 1, reader = 40),
    Book(book = 2, reader = 40),
    Book(book = 2, reader = 30))

  def main(args: Array[String]) {
    Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
    Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)
    // set up environment
    val conf = new SparkConf()
      .setMaster("local[5]")
      .setAppName("Small")
      .set("spark.executor.memory", "2g")
    val sc = new SparkContext(conf)

    val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    import sqlContext.implicits._
    val df = sc.parallelize(recs).toDF()

    val readerGroups = df.groupBy("reader").count()
    readerGroups.show()

    readerGroups.registerTempTable("readerGroups")
    readerGroups.printSchema()

    // "SELECT reader, count FROM readerGroups ORDER BY count DESC"
    val readerGroupsSorted = sqlContext.sql("SELECT * FROM readerGroups ORDER BY count DESC")
    readerGroupsSorted.show()
    println("Group Cnt: "+readerGroupsSorted.count())

And this is an output, 'groupBy` works all right: 这是一个输出，'groupBy`可以正常工作：

    reader count
    40     2    
    10     2    
    20     2    
    30     2

Resulting schema: 结果架构：

    root
     |-- reader: integer (nullable = false)
     |-- count: long (nullable = false)

Yet SELECT * FROM readerGroups ORDER BY count DESC fails with exception (see below). 然而SELECT * FROM readerGroups ORDER BY count DESC失败并出现异常（见下文）。 In fact all other select rtequests fail as well, except for SELECT * FROM readerGroups and SELECT reader FROM readerGroups - these work. 事实上，除了 SELECT * FROM readerGroups和SELECT reader FROM readerGroups 之外，所有其他select rtequest也会失败 - 这些都有效。 Why is that? 这是为什么？

How to make ORDER BY count DESC work? 如何使ORDER BY count DESC工作？

    Exception in thread "main" java.lang.RuntimeException: [1.43] failure: ``('' expected but `desc' found

    SELECT * FROM readerGroups ORDER BY count DESC
                                              ^
        at scala.sys.package$.error(package.scala:27)
        at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(AbstractSparkSQLParser.scala:40)
        at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:134)
        at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:134)
        at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:96)
        at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:95)
        at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
        at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
        at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
        at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
        at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
        at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
        at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
        at scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890)
        at scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
        at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(AbstractSparkSQLParser.scala:38)
        at org.apache.spark.sql.SQLContext$$anonfun$parseSql$1.apply(SQLContext.scala:138)
        at org.apache.spark.sql.SQLContext$$anonfun$parseSql$1.apply(SQLContext.scala:138)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:138)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:933)
        at Small$.main(Small.scala:60)
        at Small.main(Small.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)

Answer 1

the problem is the name of the colum COUNT. 问题是colum COUNT的名称。 COUNT is a reserved word in spark, so you cant use his name to do a query, or a sort by this field. COUNT是spark中的保留字，因此您无法使用其名称进行查询，也无法使用此字段进行排序。

You can try to do it with backticks: 您可以尝试使用反引号：

select * from readerGroups ORDER BY `count` DESC

The other option is to rename the column count by something different like NumReaders or whatever... 另一种选择是通过不同的NumReaders或其他任何东西重命名列数。

Answer 2

Use a derived table to order by a calculated field (such as top, max, count ...) 使用派生表按计算字段排序（例如top，max，count ...）

SELECT * FROM
(
SELECT reader, count(book) AS book_count
FROM readerbook
GROUP by reader) a
ORDER BY book_count desc

Actually on second thought, it might be possible to just do your order by if you use an alias like this: 实际上在第二个想法，如果您使用这样的别名，可能只需要执行您的订单：

SELECT reader, count(book) AS book_count
FROM readerbook
GROUP by reader
ORDER BY book_count desc

Spark SQL：ORDER BY计数DESC失败了吗？

问题描述

2 个解决方案

解决方案1
3 已采纳 2015-06-04 11:52:40

解决方案2
0 2015-06-04 11:50:42

Spark SQL：ORDER BY计数DESC失败了吗？

问题描述

2 个解决方案

解决方案1 3 已采纳 2015-06-04 11:52:40

解决方案2 0 2015-06-04 11:50:42

解决方案1
3 已采纳 2015-06-04 11:52:40

解决方案2
0 2015-06-04 11:50:42