简体   繁体   English

从字符串文件创建Scala Map的最有效方法?

[英]Most efficient way to create a Scala Map from a file of strings?

Now, I am trying to create a Map[String, String] from the csv file where the word is the Key* , and the pronunciation is the Value . 现在,我试图从csv文件中创建Map[String, String] ,其中单词是Key * ,发音是Value I have managed to do it myself using the code below. 我自己使用下面的代码成功做到了。

def mapFile(filename: String): Map[String, String] = {
    var content: String = ""
    val file: BufferedSource = Source.fromFile(filename)

    for (line <- file.getLines()) {
      if (!line.contains("//")) {
        content = content + line + "//"
      }
    }

    content.split("//").map(_.split("  ")).map(arr => arr(0) -> arr(1)).toMap
}

So file reads the text file, and for every line in the the text file that is not // , it creates a string and then splits the string into key-value, key being split by " " and value being split by `"//"``. 因此,文件读取了文本文件,并为文本文件中不是//每一行创建了一个字符串,然后将该字符串拆分为键值,键被" "分割,值被`“ / /“``。

However, it is too slow. 但是,它太慢了。
Is there a more efficient way i can create the map without it taking 5 minutes? 有没有一种更有效的方法可以创建地图,而无需花费5分钟?

I believe your main problem is that you are reading all your file into a String to reprocess it after. 我相信您的主要问题是,您正在将所有文件读入字符串中,以便在以后对其进行重新处理。 Which means, you don't only allocate twice of required memory, but that you process your file twice too. 这意味着,您不仅分配了两次所需的内存,而且还对文件进行了两次处理

The first improvement you may made to your code is to do everything in just one iteration . 您可能对代码进行的第一个改进是仅需一次迭代即可完成所有操作。

import scala.io.Source

def mapFile(filename: String): Map[String, String] =
  (for {
    line <- Source.fromFile(filename).getLines
    if (line.nonEmpty && !line.startsWith(";;;"))
    Array(word, pronunciation) = line.split("  ")
  } yield word -> pronunciation).toMap

The above code is equivalent (and will be desugared to something very similar) to this: 上面的代码与此等效(并将简化为类似的代码)

import scala.io.Source

def mapFile(filename: String): Map[String, String] =
  Source
    .fromFile(filename)
    .getLines
    .filter(line => line.nonEmpty && !line.startsWith(";;;"))
    .map(line => line.split("  "))
    .map { case Array(word, pronunciation) => word -> pronunciation }
    .toMap

Additionally, if the input file is too big, you may give a look to FS2 , or Akka-Streams , or any other kind of streaming to process the file by chunks. 另外,如果输入文件太大,则可以查看FS2Akka-Streams或任何其他类型的处理以按块处理文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM