简体   繁体   English

Spark SQL:ORDER BY计数DESC失败了吗?

[英]Spark SQL: ORDER BY count DESC fails?

There is a table with two columns books and readers of these books, where books and readers are book and reader IDs, respectively. 这里有一本包含两列booksreaders的书桌,其中booksreaders分别是书籍和读者ID。 When trying to order readers by number of books they read, I get AbstractSparkSQLParser exception: 当他们尝试按照他们阅读的书籍订购读者时,我得到了AbstractSparkSQLParser异常:

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.log4j.Logger
import org.apache.log4j.Level
import org.apache.spark.sql.functions._

object Small {

  case class Book(book: Int, reader: Int)

  val recs = Array(
    Book(book = 1, reader = 30),
    Book(book = 2, reader = 10),
    Book(book = 3, reader = 20),
    Book(book = 1, reader = 20),
    Book(book = 1, reader = 10),
    Book(book = 1, reader = 40),
    Book(book = 2, reader = 40),
    Book(book = 2, reader = 30))

  def main(args: Array[String]) {
    Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
    Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)
    // set up environment
    val conf = new SparkConf()
      .setMaster("local[5]")
      .setAppName("Small")
      .set("spark.executor.memory", "2g")
    val sc = new SparkContext(conf)

    val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    import sqlContext.implicits._
    val df = sc.parallelize(recs).toDF()

    val readerGroups = df.groupBy("reader").count()
    readerGroups.show()

    readerGroups.registerTempTable("readerGroups")
    readerGroups.printSchema()

    // "SELECT reader, count FROM readerGroups ORDER BY count DESC"
    val readerGroupsSorted = sqlContext.sql("SELECT * FROM readerGroups ORDER BY count DESC")
    readerGroupsSorted.show()
    println("Group Cnt: "+readerGroupsSorted.count())

And this is an output, 'groupBy` works all right: 这是一个输出,'groupBy`可以正常工作:

    reader count
    40     2    
    10     2    
    20     2    
    30     2    

Resulting schema: 结果架构:

    root
     |-- reader: integer (nullable = false)
     |-- count: long (nullable = false)

Yet SELECT * FROM readerGroups ORDER BY count DESC fails with exception (see below). 然而SELECT * FROM readerGroups ORDER BY count DESC失败并出现异常(见下文)。 In fact all other select rtequests fail as well, except for SELECT * FROM readerGroups and SELECT reader FROM readerGroups - these work. 事实上, 除了 SELECT * FROM readerGroupsSELECT reader FROM readerGroups 之外,所有其他select rtequest也会失败 - 这些都有效。 Why is that? 这是为什么?

How to make ORDER BY count DESC work? 如何使ORDER BY count DESC工作?

    Exception in thread "main" java.lang.RuntimeException: [1.43] failure: ``('' expected but `desc' found

    SELECT * FROM readerGroups ORDER BY count DESC
                                              ^
        at scala.sys.package$.error(package.scala:27)
        at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(AbstractSparkSQLParser.scala:40)
        at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:134)
        at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:134)
        at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:96)
        at org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:95)
        at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
        at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
        at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
        at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
        at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
        at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
        at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
        at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
        at scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890)
        at scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
        at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.apply(AbstractSparkSQLParser.scala:38)
        at org.apache.spark.sql.SQLContext$$anonfun$parseSql$1.apply(SQLContext.scala:138)
        at org.apache.spark.sql.SQLContext$$anonfun$parseSql$1.apply(SQLContext.scala:138)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:138)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:933)
        at Small$.main(Small.scala:60)
        at Small.main(Small.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)

the problem is the name of the colum COUNT. 问题是colum COUNT的名称。 COUNT is a reserved word in spark, so you cant use his name to do a query, or a sort by this field. COUNT是spark中的保留字,因此您无法使用其名称进行查询,也无法使用此字段进行排序。

You can try to do it with backticks: 您可以尝试使用反引号:

select * from readerGroups ORDER BY `count` DESC

The other option is to rename the column count by something different like NumReaders or whatever... 另一种选择是通过不同的NumReaders或其他任何东西重命名列数。

Use a derived table to order by a calculated field (such as top, max, count ...) 使用派生表按计算字段排序(例如top,max,count ...)

SELECT * FROM
(
SELECT reader, count(book) AS book_count
FROM readerbook
GROUP by reader) a
ORDER BY book_count desc

Actually on second thought, it might be possible to just do your order by if you use an alias like this: 实际上在第二个想法,如果您使用这样的别名,可能只需要执行您的订单:

SELECT reader, count(book) AS book_count
FROM readerbook
GROUP by reader
ORDER BY book_count desc

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM