简体   繁体   English

为什么这个Scala代码会变慢?

[英]Why is this Scala code slow?

I'm running the following Scala code: 我正在运行以下Scala代码:

import scala.util.parsing.json._
import scala.io._

object Main {
        def jsonStringMap(str: String) =
                JSON.parseFull(str) match {
                        case Some(m: Map[_,_]) => m collect {
                                        // If this doesn't match, we'll just ignore the value
                                        case (k: String, v: String) => (k,v)
                                } toMap
                        case _ => Map[String,String]()
                }

        def main(args: Array[String]) {
                val fh = Source.fromFile("listings.txt")
                try {
                        fh.getLines map(jsonStringMap) foreach { v => println(v) }
                } finally {
                        fh.close
                }
        }
}

On my machine it takes ~3 minutes on the file from http://sortable.com/blog/coding-challenge/ . 在我的机器上, http://sortable.com/blog/coding-challenge/上的文件大约需要3分钟。 Equivalent Haskell and Ruby programs I wrote take under 4 seconds. 我编写的等效Haskell和Ruby程序在4秒内完成。 What am I doing wrong? 我究竟做错了什么?

I tried the same code without the map(jsonStringMap) and it was plenty fast, so is the JSON parser just really slow? 我尝试了没有地图的相同代码(jsonStringMap)并且速度非常快,JSON解析器真的很慢吗?

It does seem likely that the default JSON parser is just really slow, however I tried https://github.com/stevej/scala-json and while that gets it down to 35 seconds, that's still much slower than Ruby. 看起来很可能默认的JSON解析器真的很慢,但我尝试了https://github.com/stevej/scala-json ,虽然它降低到35秒,但仍然比Ruby慢得多。

I am now using https://github.com/codahale/jerkson which is even faster ! 我现在使用的是https://github.com/codahale/jerkson速度更快 My program now runs in only 6 seconds on my data, only 3 seconds slower than Ruby, which is probably just the JVM starting up. 我的程序现在只在我的数据上运行6秒,比Ruby慢3秒,这可能只是JVM的启动。

A quick look at the scala-user archive seems to indicate that nobody is doing serious work with the JSON parser in the scala standard library. 快速查看scala-user存档似乎表明没有人在scala标准库中使用JSON解析器进行认真的工作。

See http://groups.google.com/group/scala-user/msg/fba208f2d3c08936 请参阅http://groups.google.com/group/scala-user/msg/fba208f2d3c08936

It seems the parser ended up in the standard library at a time when scala was less in the spotlight and didn't have the expectations it has today. 看起来解析器最终出现在标准库中,当时斯卡拉不太受欢迎,并且没有今天的期望。

Use Jerkson . 使用杰克逊 Jerkson uses Jackson which is always the fastest JSON library on the JVM (especially when stream reading/writing) large documents. Jerkson使用Jackson,它始终是JVM上最快的JSON库(特别是在流读/写时)大文档。

Using my JSON library , I get an almost instantaneous parse of both files: 使用我的JSON库 ,我几乎可以立即解析这两个文件:

import com.github.seanparsons.jsonar._
import scala.io.Source
def parseLines[T](file: String, transform: (Iterator[String]) => T): T = {
  val log = Source.fromFile(file)
  val logLines = log.getLines()
  try { transform(logLines) } finally { log.close }
}
def parseFile(file: String) = parseLines(file, (iterator) => iterator.map(Parser.parse(_)).toList)
parseFile("products.txt"); parseFile("listings.txt")

However, as someone mentioned, it would be more useful to just parse the whole thing as a JSONArray rather than have lots of individual lines as this does. 然而,正如有人提到的那样,将整个事物解析为JSONArray而不是像这样有很多单独的行会更有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM