简体   繁体   English

Akka:如何在一个图形阶段中提取值并在下一个图形阶段中使用它

[英]Akka: How to extract a value in one graph stage and use it in the next

I am using Alpakka and Akka to process a CSV file. 我正在使用Alpakka和Akka处理CSV文件。 Since I have a bunch of CSV files that have to be added to the same stream, I would like to add a field that contains information from the file name or request. 由于必须将一堆CSV文件添加到同一流中,因此我想添加一个字段,其中包含来自文件名或请求的信息。 Currently I have something like this: 目前我有这样的事情:

val source = FileIO.fromPath(Paths.get("10002070.csv"))
  .via(CsvParsing.lineScanner())

Which streams a Sequence of Lists (lines) of ByteStrings (fields). 哪个流的字节串(字段)的列表(行)序列。 The goal would be something like: 目标将是这样的:

val filename = "10002070.csv"
val source = FileIO.fromPath(Path.get(filename))
    .via(CsvParsing.lineScanner())
    .via(AddCSVFieldHere(filename))

Creating a structure similar to: 创建类似于以下内容的结构:

10002070.csv,max,estimated,12,1,0

Where the filename is a field non-existent in the original source. 文件名是原始源中不存在的字段。

I thing it does not look very pretty to inject values mid-stream, plus eventually I would like to determine the filenames passed to the parsing in a stream stage that reads a directory. 我觉得在中间插入值看起来不是很漂亮,再加上最终我想确定在读取目录的流阶段传递给解析的文件名。

What is the correct/canonical way to pass values through stream stages for later re-use? 通过流阶段传递值以供以后重用的正确/规范方法是什么?

You could transform the stream with map to add the file name to each List[ByteString] : 您可以使用map转换流,以将文件名添加到每个List[ByteString]

val fileName = "10002070.csv"
val source =
  FileIO.fromPath(Path.get(fileName))
    .via(CsvParsing.lineScanner())
    .map(List(ByteString(fileName)) ++ _)

For example: 例如:

Source.single(ByteString("""header1,header2,header3
                           |1,2,3
                           |4,5,6""".stripMargin))
  .via(CsvParsing.lineScanner())
  .map(List(ByteString("myfile.csv")) ++ _)
  .runForeach(row => println(row.map(_.utf8String)))

// The above code prints the following:
// List(myfile.csv, header1, header2, header3)
// List(myfile.csv, 1, 2, 3)
// List(myfile.csv, 4, 5, 6)

The same approach is applicable in the more general case in which you don't know the file names upfront. 在更一般的情况下(您不预先知道文件名)也可以使用相同的方法。 If you want to read all the files in a directory (assuming that all of these files are csv files), concatenate the files into a single stream, and preserve the file name in each stream element, then you could do so with Alpakka's Directory utility in the following manner: 如果要读取目录中的所有文件(假设所有这些文件都是csv文件),将文件串联到单个流中,并在每个流元素中保留文件名,则可以使用Alpakka的Directory实用程序以以下方式:

val source =
  Directory.ls(Paths.get("/my/dir")) // Source[Path, NotUsed]
    .flatMapConcat { path =>
       FileIO.fromPath(path)
         .via(CsvParsing.lineScanner())
         .map(List(ByteString(path.getFileName.toString)) ++ _)
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM