[英]How to generalise implementations of 'Seq[String] => Seq[Int]' and 'Iterator[String] => Iterator[Int]' for file processing?
Suppose I've got a function Seq[String] => Seq[Int]
, eg def len(as: Seq[String]): Int = as.map(_.length)
.假设我有一个函数Seq[String] => Seq[Int]
,例如def len(as: Seq[String]): Int = as.map(_.length)
。 Now I would like to apply this function to a text file, eg transform all the file lines to numbers.现在我想将此函数应用于文本文件,例如将所有文件行转换为数字。
I read a text file as scala.io.Source.fromFile("/tmp/xxx.txt").getLines
that returns an iterator.我将文本文件读取为scala.io.Source.fromFile("/tmp/xxx.txt").getLines
,它返回一个迭代器。
I can use toList
or to(LazyList)
to "convert" the iterator to Seq
but then I read the whole file into the memory.我可以使用toList
或to(LazyList)
将迭代器“转换”为Seq
,然后我将整个文件读入内存。
So I need to write another function Iterator[String] => Iterator[Int]
, which is actually a copied version of Seq[String] => Seq[Int]
.所以我需要编写另一个函数Iterator[String] => Iterator[Int]
,它实际上是Seq[String] => Seq[Int]
的复制版本。 Is it correct ?这是正确的吗 ? What is the best way to avoid the duplicated code?避免重复代码的最佳方法是什么?
If you have an arbitrary function Seq[String] => Seq[Int]
, then如果你有一个任意函数Seq[String] => Seq[Int]
,那么
I use toList or to(LazyList) to "convert" the iterator to Seq but in both cases I read the whole file in the memory.我使用 toList 或 to(LazyList) 将迭代器“转换”为 Seq,但在这两种情况下,我都读取了内存中的整个文件。
is the best you can do, because the function can start by looking at the end of the Seq[String]
, or its length, etc.是您能做的最好的事情,因为该函数可以从查看Seq[String]
的末尾或其长度等开始。
And Scala doesn't let you look "inside" the function and figure out "it's map(something)
, I can just do the same map
for iterators" (there are some caveats with macros, but not really useful here). Scala 不允许您查看函数的“内部”并找出“它是map(something)
,我可以为迭代器做相同的map
”(有一些宏的警告,但在这里并不是很有用)。
So I need to write another function
Iterator[String] => Iterator[Int]
, which is actually a copied version ofSeq[String] => Seq[Int]
.所以我需要编写另一个函数Iterator[String] => Iterator[Int]
,它实际上是Seq[String] => Seq[Int]
的复制版本。 Is it correct ?这是正确的吗 ? What is the best way to avoid the duplicated code?避免重复代码的最佳方法是什么?
If you control the definition of the function, you can use higher-kinded types to define a function which works for both cases.如果您控制函数的定义,则可以使用更高级的类型来定义对这两种情况都适用的函数。 Eg in Scala 2.13例如在 Scala 2.13
def len[C[A] <: IterableOnceOps[A, C, C[A]]](as: C[String]): C[Int] = as.map(_.length)
val x: Seq[Int] = len(Seq("a", "b"))
val y: Iterator[Int] = len(Iterator("a", "b"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.