Is Time Series implementation using functional programming (F#) recommended?

Question

I am developing aa project in .NET, part of which I will be manipulating times series.

Since the main part of the project has been implemented in C#, I've sketched an object-oriented design inheriting from SortedDictionary<DateTime,T> .

However, I've been in love with functional programming for the last few years, and I figured that since this component will be subject to pretty wild and intense algorithms, I would be willing to process it in parallel, and I would enjoy having an immutable structure.

I thought about designing it in F# using defining a type as follows:

type TimeSeries<'t> = (DateTime * 't) seq

and going on with it.

It would have the advantage of being immutable, and the execution in parallel would be pretty straightforward using F#'s Async module. I could also use the unit of measure feature of F#.

I am just a bit scared of having to use the results of the computations in C#, and I wondered if someone who's tried already could give me some feedback about the result in practice.

Was it easy to use in the end or was it too complicated to switch from C# to F#?

Isn't the fact that the collection is immutable an efficiency problem when the time series get larger?

Will I be able to keep the type generic when I will try to divide elements, or will I have to switch to TimeSeries<float> pretty quickly with my functions?

If I want to use C# based algorithm on the time series for some features, will that make this whole idea useless?

Have you got some reference of research done on the efficiency of functional implementation of time series?

Answer 1

It would have the advantage of being immutable, and the execution in parallel would be pretty straightforward using F#'s Async module.

On the contrary, seq are slow and inherently serial. The literal F# equivalent of SortedDictionary is Map but it has no support for parallelism. The Async module is good for asynchronous concurrent programming but bad for parallelism.

Assuming you want fast search by time and iterate in-order but not incremental insertion/deletion then you want a sorted array of KeyValuePair<DateTime, 'T> because this offers excellent locality and, therefore, cache complexity for parallel algorithms. Note that arrays can be purely functional if you avoid mutating them. Beware that F# 2 does not type specialize operations (like comparison) over DateTime so you'll need to call them manually.

The idiomatic purely functional equivalent of that would be a balanced search tree partitioned by time:

type TimeSeries<'a> =
  | Leaf of DateTime * 'a
  | Branch of TimeSeries<'a> * DateTime * TimeSeries<'a>

This permits elegant "parallel" functions. However, the reality is that purely functional programming is not good for multicore parallelism because it cannot provide any assurances about locality and, therefore, the cache complexity of purely functional algorithms is unpredictable and performance is often poor.

Isn't the fact that the collection is immutable an efficiency problem when the time series get larger?

Depends entirely on what you want to do with it.

Have you got some reference of research done on the efficiency of functional implementation of time series?

You haven't said anything about the algorithms you intend to implement or even the operations you want to be fast so it is difficult to talk about measured performance in a useful way. Running a quick benchmark on my netbook, inserting 1,000,000 bindings into a dictionary, shows that the mutable SortedDictionary takes 5.2s and immutable Map takes 11.8s so there is a significant but not huge difference. Building the equivalent array takes just 0.027s. Iterating then takes 0.38s, 0.20s and 0.01s, respectively.

I am just a bit scared of having to use the results of the computations in C#, and I wondered if someone who's tried already could give me some feedback about the result in practice.

Just expose a standard .NET interface from your F# code and it is easy.

Answer 2

Some points to note:

In case you want to expose a F# component API to C# (or other CLR language) then you should use BCL (or OO types) in the public API of the F# component. Otherwise you will need to understand all the types that F# core library uses to implement the Functional feel of F#. Ex: FsharFunc
Parallel processing (read only) for immutable data structure is good as you are sure that nobody will modify the data from behind the scenes and hence you don't need to do locking etc.
Immutable data structure "may" not sound good when you want to lets says append a item to the end of a list, which theoretically in case of immutable data will copy the whole list along with the new item. This is usually avoided by some smart implementations of immutable data structures like Persistent data structure in clojure which are not there in F# (yet)

I hope the above points helps you in deciding what would best fit your specific implementation.

Is Time Series implementation using functional programming (F#) recommended?

Question

2 answers

solution1
8 ACCPTED 2011-11-07 22:59:00

solution2
2 2011-11-07 04:22:26

Is Time Series implementation using functional programming (F#) recommended?

Question

2 answers

solution1 8 ACCPTED 2011-11-07 22:59:00

solution2 2 2011-11-07 04:22:26

solution1
8 ACCPTED 2011-11-07 22:59:00

solution2
2 2011-11-07 04:22:26