简体   繁体   English

如何概括 'Seq[String] => Seq[Int]' 和 'Iterator[String] => Iterator[Int]' 的实现以进行文件处理?

[英]How to generalise implementations of 'Seq[String] => Seq[Int]' and 'Iterator[String] => Iterator[Int]' for file processing?

Suppose I've got a function Seq[String] => Seq[Int] , eg def len(as: Seq[String]): Int = as.map(_.length) .假设我有一个函数Seq[String] => Seq[Int] ,例如def len(as: Seq[String]): Int = as.map(_.length) Now I would like to apply this function to a text file, eg transform all the file lines to numbers.现在我想将此函数应用于文本文件,例如将所有文件行转换为数字。

I read a text file as scala.io.Source.fromFile("/tmp/xxx.txt").getLines that returns an iterator.我将文本文件读取为scala.io.Source.fromFile("/tmp/xxx.txt").getLines ,它返回一个迭代器。
I can use toList or to(LazyList) to "convert" the iterator to Seq but then I read the whole file into the memory.我可以使用toListto(LazyList)将迭代器“转换”为Seq ,然后我将整个文件读入内存。

So I need to write another function Iterator[String] => Iterator[Int] , which is actually a copied version of Seq[String] => Seq[Int] .所以我需要编写另一个函数Iterator[String] => Iterator[Int] ,它实际上是Seq[String] => Seq[Int]复制版本。 Is it correct ?这是正确的吗 ? What is the best way to avoid the duplicated code?避免重复代码的最佳方法是什么?

If you have an arbitrary function Seq[String] => Seq[Int] , then如果你有一个任意函数Seq[String] => Seq[Int] ,那么

I use toList or to(LazyList) to "convert" the iterator to Seq but in both cases I read the whole file in the memory.我使用 toList 或 to(LazyList) 将迭代器“转换”为 Seq,但在这两种情况下,我都读取了内存中的整个文件。

is the best you can do, because the function can start by looking at the end of the Seq[String] , or its length, etc.是您能做的最好的事情,因为该函数可以从查看Seq[String]的末尾或其长度等开始。

And Scala doesn't let you look "inside" the function and figure out "it's map(something) , I can just do the same map for iterators" (there are some caveats with macros, but not really useful here). Scala 不允许您查看函数的“内部”并找出“它是map(something) ,我可以为迭代器做相同的map ”(有一些宏的警告,但在这里并不是很有用)。

So I need to write another function Iterator[String] => Iterator[Int] , which is actually a copied version of Seq[String] => Seq[Int] .所以我需要编写另一个函数Iterator[String] => Iterator[Int] ,它实际上是Seq[String] => Seq[Int]的复制版本。 Is it correct ?这是正确的吗 ? What is the best way to avoid the duplicated code?避免重复代码的最佳方法是什么?

If you control the definition of the function, you can use higher-kinded types to define a function which works for both cases.如果您控制函数的定义,则可以使用更高级的类型来定义对这两种情况都适用的函数。 Eg in Scala 2.13例如在 Scala 2.13

def len[C[A] <: IterableOnceOps[A, C, C[A]]](as: C[String]): C[Int] = as.map(_.length)

val x: Seq[Int] = len(Seq("a", "b"))      
val y: Iterator[Int] = len(Iterator("a", "b"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM