简体   繁体   English

是否推荐使用函数式编程(F#)的时间序列实现?

[英]Is Time Series implementation using functional programming (F#) recommended?

I am developing aa project in .NET, part of which I will be manipulating times series. 我正在开发一个.NET项目,其中一部分我将操纵时间序列。

Since the main part of the project has been implemented in C#, I've sketched an object-oriented design inheriting from SortedDictionary<DateTime,T> . 由于项目的主要部分已在C#中实现,因此我描绘了一个继承自SortedDictionary<DateTime,T>的面向对象设计。

However, I've been in love with functional programming for the last few years, and I figured that since this component will be subject to pretty wild and intense algorithms, I would be willing to process it in parallel, and I would enjoy having an immutable structure. 然而,在过去的几年里,我一直爱着函数式编程,我认为由于这个组件将受到相当狂野和强烈的算法的影响,我愿意并行处理它,我很乐意拥有不可变结构。

I thought about designing it in F# using defining a type as follows: 我想用F#设计它,使用如下定义类型:

type TimeSeries<'t> = (DateTime * 't) seq

and going on with it. 并继续下去。

It would have the advantage of being immutable, and the execution in parallel would be pretty straightforward using F#'s Async module. 它具有不可变的优点,并且使用F#的Async模块并行执行将非常简单。 I could also use the unit of measure feature of F#. 我还可以使用F#的度量单位功能。

I am just a bit scared of having to use the results of the computations in C#, and I wondered if someone who's tried already could give me some feedback about the result in practice. 我有点害怕不得不在C#中使用计算结果,我想知道是否已经尝试过的人可以在实践中给我一些关于结果的反馈。

Was it easy to use in the end or was it too complicated to switch from C# to F#? 最后是否易于使用,还是从C#切换到F#太复杂了?

Isn't the fact that the collection is immutable an efficiency problem when the time series get larger? 当时间序列变大时,集合是不可变的效率问题吗?

Will I be able to keep the type generic when I will try to divide elements, or will I have to switch to TimeSeries<float> pretty quickly with my functions? 当我尝试划分元素时,我是否可以保持类型通用,或者我是否必须使用我的函数快速切换到TimeSeries<float>

If I want to use C# based algorithm on the time series for some features, will that make this whole idea useless? 如果我想在某些功能的时间序列上使用基于C#的算法,这会使整个想法变得无用吗?

Have you got some reference of research done on the efficiency of functional implementation of time series? 您是否参考过有关时间序列功能实现效率的研究?

It would have the advantage of being immutable, and the execution in parallel would be pretty straightforward using F#'s Async module. 它具有不可变的优点,并且使用F#的异步模块并行执行将非常简单。

On the contrary, seq are slow and inherently serial. 相反, seq很慢并且本质上是连续的。 The literal F# equivalent of SortedDictionary is Map but it has no support for parallelism. SortedDictionary的字面F#等价物是Map但它不支持并行性。 The Async module is good for asynchronous concurrent programming but bad for parallelism. Async模块适用于异步并发编程,但对并行性有害。

Assuming you want fast search by time and iterate in-order but not incremental insertion/deletion then you want a sorted array of KeyValuePair<DateTime, 'T> because this offers excellent locality and, therefore, cache complexity for parallel algorithms. 假设您希望按时间快速搜索并按顺序迭代但不是增量插入/删除,那么您需要一个KeyValuePair<DateTime, 'T>的排序数组,因为这提供了出色的局部性,因此,并行算法的缓存复杂性。 Note that arrays can be purely functional if you avoid mutating them. 请注意,如果您避免变异,那么数组可以是纯函数。 Beware that F# 2 does not type specialize operations (like comparison) over DateTime so you'll need to call them manually. 请注意,F#2不会在DateTime键入specialize操作(如比较),因此您需要手动调用它们。

The idiomatic purely functional equivalent of that would be a balanced search tree partitioned by time: 惯用的纯功能等价物是由时间划分的平衡搜索树:

type TimeSeries<'a> =
  | Leaf of DateTime * 'a
  | Branch of TimeSeries<'a> * DateTime * TimeSeries<'a>

This permits elegant "parallel" functions. 这允许优雅的“并行”功能。 However, the reality is that purely functional programming is not good for multicore parallelism because it cannot provide any assurances about locality and, therefore, the cache complexity of purely functional algorithms is unpredictable and performance is often poor. 然而,实际情况是纯函数式编程对于多核并行性并不好,因为它无法提供有关局部性的任何保证,因此,纯函数式算法的高速缓存复杂性是不可预测的,性能通常很差。

Isn't the fact that the collection is immutable an efficiency problem when the time series get larger? 当时间序列变大时,集合是不可变的效率问题吗?

Depends entirely on what you want to do with it. 完全取决于你想用它做什么。

Have you got some reference of research done on the efficiency of functional implementation of time series? 您是否参考过有关时间序列功能实现效率的研究?

You haven't said anything about the algorithms you intend to implement or even the operations you want to be fast so it is difficult to talk about measured performance in a useful way. 您还没有说过您打算实施的算法,甚至您想要快速执行的操作,因此很难以有用的方式讨论测量的性能。 Running a quick benchmark on my netbook, inserting 1,000,000 bindings into a dictionary, shows that the mutable SortedDictionary takes 5.2s and immutable Map takes 11.8s so there is a significant but not huge difference. 在我的上网本上运行快速基准测试,在字典中插入1,000,000个绑定,表明可变的SortedDictionary需要5.2s,而不可变的Map需要11.8s,因此存在显着但不是很大的差异。 Building the equivalent array takes just 0.027s. 构建等效数组只需0.027秒。 Iterating then takes 0.38s, 0.20s and 0.01s, respectively. 迭代然后分别需要0.38s,0.20s和0.01s。

I am just a bit scared of having to use the results of the computations in C#, and I wondered if someone who's tried already could give me some feedback about the result in practice. 我有点害怕不得不在C#中使用计算结果,我想知道是否已经尝试过的人可以在实践中给我一些关于结果的反馈。

Just expose a standard .NET interface from your F# code and it is easy. 只需从F#代码中公开一个标准的.NET接口,这很简单。

Some points to note: 有些要点需要注意:

  • In case you want to expose a F# component API to C# (or other CLR language) then you should use BCL (or OO types) in the public API of the F# component. 如果要将F#组件API公开给C#(或其他CLR语言),则应在F#组件的公共API中使用BCL(或OO类型)。 Otherwise you will need to understand all the types that F# core library uses to implement the Functional feel of F#. 否则,您需要了解F#核心库用于实现F#的功能感的所有类型。 Ex: FsharFunc 例如: FsharFunc
  • Parallel processing (read only) for immutable data structure is good as you are sure that nobody will modify the data from behind the scenes and hence you don't need to do locking etc. 对于不可变数据结构的并行处理(只读)很好,因为您确定没有人会从后台修改数据,因此您不需要进行锁定等。
  • Immutable data structure "may" not sound good when you want to lets says append a item to the end of a list, which theoretically in case of immutable data will copy the whole list along with the new item. 当你想让一个项目附加到一个列表的末尾时,不可变数据结构“可能”听起来不太好,理论上在不可变数据的情况下,它会将整个列表与新项目一起复制。 This is usually avoided by some smart implementations of immutable data structures like Persistent data structure in clojure which are not there in F# (yet) 这通常是通过一些不可变数据结构的智能实现来避免的,例如clojure Persistent数据结构 ,而F#中没有这种结构

I hope the above points helps you in deciding what would best fit your specific implementation. 我希望以上几点可以帮助您确定最适合您具体实施的内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM