[英]Most efficient way to create a Scala Map from a file of strings?
Now, I am trying to create a Map[String, String]
from the csv file where the word is the Key* , and the pronunciation is the Value . 现在,我试图从csv文件中创建
Map[String, String]
,其中单词是Key * ,发音是Value 。 I have managed to do it myself using the code below. 我自己使用下面的代码成功做到了。
def mapFile(filename: String): Map[String, String] = {
var content: String = ""
val file: BufferedSource = Source.fromFile(filename)
for (line <- file.getLines()) {
if (!line.contains("//")) {
content = content + line + "//"
}
}
content.split("//").map(_.split(" ")).map(arr => arr(0) -> arr(1)).toMap
}
So file reads the text file, and for every line in the the text file that is not //
, it creates a string and then splits the string into key-value, key being split by " "
and value being split by `"//"``. 因此,文件读取了文本文件,并为文本文件中不是
//
每一行创建了一个字符串,然后将该字符串拆分为键值,键被" "
分割,值被`“ / /“``。
However, it is too slow. 但是,它太慢了。
Is there a more efficient way i can create the map without it taking 5 minutes? 有没有一种更有效的方法可以创建地图,而无需花费5分钟?
I believe your main problem is that you are reading all your file into a String to reprocess it after. 我相信您的主要问题是,您正在将所有文件读入字符串中,以便在以后对其进行重新处理。 Which means, you don't only allocate twice of required memory, but that you process your file twice too.
这意味着,您不仅分配了两次所需的内存,而且还对文件进行了两次处理 。
The first improvement you may made to your code is to do everything in just one iteration . 您可能对代码进行的第一个改进是仅需一次迭代即可完成所有操作。
import scala.io.Source
def mapFile(filename: String): Map[String, String] =
(for {
line <- Source.fromFile(filename).getLines
if (line.nonEmpty && !line.startsWith(";;;"))
Array(word, pronunciation) = line.split(" ")
} yield word -> pronunciation).toMap
The above code is equivalent (and will be desugared to something very similar) to this: 上面的代码与此等效(并将简化为类似的代码) :
import scala.io.Source
def mapFile(filename: String): Map[String, String] =
Source
.fromFile(filename)
.getLines
.filter(line => line.nonEmpty && !line.startsWith(";;;"))
.map(line => line.split(" "))
.map { case Array(word, pronunciation) => word -> pronunciation }
.toMap
Additionally, if the input file is too big, you may give a look to FS2 , or Akka-Streams , or any other kind of streaming to process the file by chunks. 另外,如果输入文件太大,则可以查看FS2或Akka-Streams或任何其他类型的流处理以按块处理文件。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.