简体   繁体   English

在F#中使用FileHelperAsyncEngine

[英]Using FileHelperAsyncEngine in F#

I am trying to load rows from a csv files to an Elasticsearch database in f# using FileHelpers to read the csv. 我试图使用FileHelpers读取csv将csv文件中的行加载到f#中的Elasticsearch数据库中。 Everything is working for small test files with the code snippet below reading all records at once 一切都适用于小型测试文件,下面的代码片段可一次读取所有记录

let readRows<'T>(filePath:string) =
    let engine = FileHelperEngine(typeof<'T>)

    engine.ReadFile(filePath)
    |> Array.map (fun row -> row :?> 'T)

Unfortunately it needs to be able read larger files, of which many columns are discarded later, line by line. 不幸的是,它需要能够读取较大的文件,以后会逐行丢弃其中的许多列。 The function FileHelperAsyncEngine.BeginReadFile returns an IDisposable. 函数FileHelperAsyncEngine.BeginReadFile返回一个IDisposable。

let readRowsAsync<'T>(filePath:string) =
    let engine = new FileHelperAsyncEngine(typeof<'T>)

    engine.BeginReadFile(filePath:string)
    |> ...

How can I further process this object to an Array of <'T>s? 如何进一步将此对象处理为<'T> s数组?

According to the documentation , after you call BeginReadFile , the engine itself becomes an enumerable sequence over which you can iterate (which is a very strange design decision). 根据文档 ,调用BeginReadFileengine本身将成为可枚举的序列,您可以在该序列上进行迭代(这是一个非常奇怪的设计决策)。 So you can just build your own sequence on top of it: 因此,您可以在其之上构建自己的序列:

let readRowsAsync<'T>(filePath:string) = 
  seq {
    let engine = new FileHelperAsyncEngine(typeof<'T>)
    use disposable = engine.BeginReadFile(filePath)

    for r in engine do
      if not (shouldDiscard r) then yield (map r)
  }

Note that I'm using the use binding, rather than let . 请注意,我使用的是use绑定,而不是let This will ensure that the disposable is disposed after the sequence ends or the consumer stops iterating over it. 这将确保在序列结束后一次性用品被丢弃,或者消费者停止在其上进行迭代。

Note that the following will not work, even though it will compile: 请注意,即使编译以下内容,它也不会起作用:

let readRowsAsync<'T>(filePath:string) = 
  let engine = new FileHelperAsyncEngine(typeof<'T>)
  use disposable = engine.BeginReadFile(filePath)

  engine |> Seq.filter (not << shouldDiscard) |> Seq.map map

If you do it this way, the disposable will be disposed after the function returns, but before the resulting enumeration is iterated over, thus closing the file before its time. 如果以这种方式进行操作,则在函数返回之后但在对结果枚举进行迭代之前,将丢弃一次性文件,从而在其时间之前关闭文件。 To ensure that the disposable is correctly disposed, you must enclose the whole thing in a seq expression. 为确保一次性用品正确处置,您必须将整个物件括在seq表达式中。

If you really want to use Seq.filter / Seq.map instead of for / yield , you can still do this, but inside the seq expression, like this: 如果您确实想使用Seq.filter / Seq.map而不是for / yield ,您仍然可以在seq表达式中执行seq

let readRowsAsync<'T>(filePath:string) = 
  seq {
    let engine = new FileHelperAsyncEngine(typeof<'T>)
    use disposable = engine.BeginReadFile(filePath)

    yield! engine |> Seq.filter (not << shouldDiscard) |> Seq.map map
  }

You can also bring the filtering and mapping out of the seq expression (which would make your function more reusable), but the seq expression itself must remain in place, because it controls the disposing part: 您还可以将过滤和映射带出seq表达式(这会使您的函数更可重用),但是seq表达式本身必须保留在原处,因为它控制着处理部分:

let readRowsAsync<'T>(filePath:string) = 
  seq {
    let engine = new FileHelperAsyncEngine(typeof<'T>)
    use disposable = engine.BeginReadFile(filePath)

    yield! engine
  }

let results = 
  readRowsAsync<SomeType>( "someFile.txt" )
  |> Seq.filter (not << shouldDiscard) 
  |> Seq.map map

Finally, it must be noted that you should be careful handling this sequence, because it's holding on to an unmanaged resource (ie open file): don't hold it open for a long time, don't employ blocking operations while processing it, etc. 最后,必须注意,您应该小心处理此序列,因为它会占用非托管资源(即打开的文件):不要长时间保持打开状态,在处理它时不要使用阻塞操作,等等

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM