How would you approach this problem in F#? (high frequency sensor data)

Question

I'm a mechanical engineering grad student and my adviser has just asked me to write a data visualization utility for one of our sensor projects. As it's summer and he wants me to have some fun with it, I thought this would be a great time to learn a language adept at scientific computing, so I went ahead and plowed right into F#.

As I'm new to the functional programming paradigm I'm having a little difficulty structuring my program properly, especially given the possibility of easily combining OO/FP in F#. My task is the following:

We have dozens of sensors constantly reporting data (once every few seconds).
I need to connect to all sensors simultaneously, create an in-memory timeseries of each sensor's output, and then calculate in real time various statistics on these timeseries.
Every few hours I need to flush the data into a binary file for logging purposes.

How should I be designing my application? I've thought about something like this: 1. I was planning on connecting to each sensor to start receiving data and then dump this data onto a message queue. 2. I'd have an event-driven processing function receives data on the queue. When data is received, it is determined which sensor the data came from, and then the data is placed into the timeseries object of the corresponding sensor. 3. Every time a sensor data timeseries object is added to, I could fire an Event and have my statistics functions crunch new data for the sensor.

Obviously I need to maintain some kind of state in this application. So I'd add the following mutable data structures. I would use a generic .NET resizable List for storing my timeseries data and implement a new derivative to fire on data add events. I could store the mappings between sensorid and the actual timeseries container in a Dictionary (when data is popped off the queue, I can read the sensorid field, grab the timeseries container for that sensorid, and then easily add new data). I could also have a second dictionary to store mappings between sensorid and the various timeseries containing statistics for that sensorid timeseries). When a main sensor timeseries is added to, it fires an event to call all of the statistics functions to run themselves on the new data and store their information in the appropriate dictionary for that sensorid.

I haven't thought about how to save the data much yet, but I figured I could just write out binary files with the data.

Any advice, ideas, or references are appreciated.

Thanks :)

Answer 1

I'd recommend against implementing your new project in F# until you get a better handle on the language, or you'll just end up writing C# code in an F# syntax. At least for projects at work, its better to use a tool you know than using company money to experiment with a new technology.

But, since you asked, I'd use a mailbox processor to act as a thread-safe message handling queue for all sensors input. The message queue can recompute statistics for each message is receives. Off the top of my head, I'm thinking of a setup like this:

type SensorMsg<'a, 'b> =
    | Fetch of 'a AsyncReplyChannel
    | Post of 'b
    | Die

type SensorMessageQueue<'a, 'b>(emptyStats : 'a, compute : 'a -> 'b -> 'a) =
    let queue = MailboxProcessor.Start(fun inbox ->
            let rec loop stats =
                async {
                    let! msg = inbox.Receive()
                    match msg with
                    | Die -> return ()
                    | Post(x) -> return! loop (compute stats x)
                    | Fetch(x) -> x.Reply(stats); return! loop stats
                }
            loop emptyStats
        )

    member this.Post(x) = queue.Post(Post(x))
    member this.Fetch() = queue.PostAndReply(fun replyChannel -> Fetch(replyChannel))
    member this.Die() = queue.Post(Die)

Something like this could hold real-time running statistics of your sensors. For example, lets say I wanted to post something that would hold a running average:

let averager =
    SensorMessageQueue(
            (0, 0), (* initial state of sum, total *)
            (fun (sum, total) input -> sum + input, total + 1)
        )

averager.Post(75)
averager.Post(90)
averager.Post(80)
let x = averager.Fetch() (* returns (245, 3) *)
averager.Post(100)
let y = averager.Fetch() (* returns (345, 4) *)

A setup like this is relatively easy to work with, thread-safe, and uses no mutable state (all "state" exists in the closure's arguments). If you think about it, this is basically a glorified seq.unfold, only implemented using mailbox processor. It might be overkill, might be just right, or could be exactly what you need for your project, it depends on your requirements.

Answer 2

I may be totally wrong on this, and it isn't an answer to your question, so vote me down at will, but my 'feeling' from just over a year's interest in F# is that there aren't that many people out there (or 'in here') with much experience using F# in scientific/technical environments (I know of one exception ).

It may be that Microsoft focused their efforts on F# outreach to the financial community - rather unfortunate timing in retrospect!

Which doesn't mean that you won't get good answers for your specific question, you just aren't likely to get a "oh yeah, I did something like that last week, watch out for this problem" etc...

As an aside, are you aware of units of measure in F# ? Very interesting feature for anything technical-related.

How would you approach this problem in F#? (high frequency sensor data)

Question

2 answers

solution1
5 2009-05-30 03:04:36

solution2
2 ACCPTED 2009-06-02 06:54:04

How would you approach this problem in F#? (high frequency sensor data)

Question

2 answers

solution1 5 2009-05-30 03:04:36

solution2 2 ACCPTED 2009-06-02 06:54:04

solution1
5 2009-05-30 03:04:36

solution2
2 ACCPTED 2009-06-02 06:54:04