简体   繁体   English

Rx运算符可区分不同的序列

[英]Rx operator to distinct sequences

IMPORTANT : for a description of the results and some more details, please have a look also to my answer 重要提示 :有关结果的描述和更多详细信息,请也查看我的答案

I need to group and filter a sequence of objects/events that usually are replicated, buffering them with a TimeSpan interval. 我需要对通常被复制的对象/事件序列进行分组和过滤,并以TimeSpan间隔对其进行缓冲。 I try to explain it better with sort of marble diagrams: 我尝试用某种大理石图更好地解释它:

X-X-X-X-X-Y-Y-Y-Z-Z-Z-Z-X-X-Y-Z-Z

would produce 会产生

X---Y---Z---X---Y---Z

where X, Y and Z are different event types, and '---' means the interval. 其中X,Y和Z是不同的事件类型,“ ---”表示时间间隔。 Additionally, I would also like to distinct by a key property that it is available on all types because they have a common base class: 另外,我还想通过一个关键属性来区分它在所有类型上都可用,因为它们具有共同的基类:

X, Y, Z : A

and A contains a property Key. 并且A包含属性Key。 Using the notation Xa meaning X.Key = a, A final sample would be: 使用Xa表示X.Key = a,最终样本将是:

X.a-X.b-X.a-Y.b-Y.c-Z.a-Z.a-Z.c-Z.b-Z.c

would produce 会产生

X.a-X.b---Y.b-Y.c-Z.a-Z.c-Z.b

Can anybody help me putting together the required Linq operators (probably DistinctUntilChanged and Buffer) to achieve this behavior? 有人可以帮助我将所需的Linq运算符(可能是DistinctUntilChanged和Buffer)放在一起以实现此行为吗? Thanks 谢谢

UPDATE 18.08.12 : 更新18.08.12

as requested, I try to give a better explanation. 根据要求,我尝试给出更好的解释。 We have devices collecting and sending events to a web service. 我们有收集事件并将事件发送到Web服务的设备。 These devices have an old logic (and we can't change it due to backward compatibility) and they continuously send an event until they receive an acknowledge; 这些设备具有旧的逻辑(由于向后兼容性,我们无法更改它),并且它们不断发送事件,直到收到确认为止。 after the acknowledge, they send the next event in their queue, and so on. 确认后,他们将队列中的下一个事件发送,依此类推。 Events contain the network address of the unit and some other properties distinguishing events in the queue for each device. 事件包含设备的网络地址和一些其他属性,这些属性区分每个设备的队列中的事件。 An event looks like this: 一个事件如下所示:

class Event
{
    public string NetworkAddress { get; }

    public string EventCode { get; }

    public string AdditionalAttribute { get; }
}

The goal is that of processing every 5 seconds the distinguished events received from all devices, storing information in the database (that's why we don't want to do it in batches) and sending the ack to the device. 目的是每5秒处理一次从所有设备收到的特殊事件,将信息存储在数据库中(这就是为什么我们不希望批量处理此信息)并将确认发送到设备。 Let's make an example with only two devices and some events: 让我们举一个仅包含两个设备和一些事件的示例:

Device 'a':
Event 1 (a1): NetworkAddress = '1', EventCode = A, AdditionalAttribute = 'x'
Event 2 (a2): NetworkAddress = '1', EventCode = A, AdditionalAttribute = 'y'
Event 3 (a3): NetworkAddress = '1', EventCode = B, AdditionalAttribute = 'x'

Device 'b':
Event 1 (b1): NetworkAddress = '2', EventCode = A, AdditionalAttribute = 'y'
Event 2 (b2): NetworkAddress = '2', EventCode = B, AdditionalAttribute = 'x'
Event 3 (b3): NetworkAddress = '2', EventCode = B, AdditionalAttribute = 'y'
Event 4 (b4): NetworkAddress = '2', EventCode = C, AdditionalAttribute = 'x'

Pn are the operations done by our server, explained later

Possible marble diagram (input streams + output stream): 可能的大理石图(输入流+输出流):

Device 'a'          : -[a1]-[a1]-[a1]----------------[a2]-[a2]-[a2]-[a3]-[a3]-[a3]-...
Device 'b'          : ------[b1]-[b1]-[b2]-[b2]-[b2]------[b3]-[b3]-[b4]-[b4]-[b4]-...

Time                : ------------[1s]-----------[2s]------------[3s]------------[4s]-
DB/acks (rx output) : ------------[P1]-----------[P2]------------[P3]------------[P4]-

P1: Server stores and acknowledges [a1] and [b1]
P2: "      "      "   "            [b2]
P3: "      "      "   "            [a2] and [b3]
P4: "      "      "   "            [a3] and [b4]

At the end I think it is probably a simple combination of basic operators, but I'm new to Rx and I'm a bit confused since it seems that there are lots of operators (or combinations of operators) to get the same output stream. 最后,我认为这可能是基本运算符的简单组合,但是我对Rx还是陌生的,因为似乎有很多运算符(或运算符的组合)来获得相同的输出流,所以我有点困惑。

Update 19.08.12 : 更新19.08.12

Please keep in mind that this code runs on a server and it should run for days without memory leaks...I'm not sure about the behavior of subjects. 请记住,此代码在服务器上运行,并且可以运行数天而不会导致内存泄漏...我不确定主题的行为。 At the moment, for each event I call a push operation on a service, which calls the OnNext of a Subject on top of which I should build the query (if I'm not wrong about the usage of subjects). 目前,对于每个事件,我都会在服务上调用推送操作,该服务会调用Subject的OnNext,并在其之上构建查询(如果我对Subject的用法没错的话)。

Update 20.08.12 : 更新20.08.12

Current implementation, including validation test; 当前实施,包括验证测试; this is what I tried and it seems the same suggested by @yamen 这是我尝试过的,似乎与@yamen建议的相同

public interface IEventService
{
    // Persists the events
    void Add(IEnumerable<Event> events);
}

public class Event
{
    public string Description { get; set; }
}

/// <summary>
/// Implements the logic to handle events.
/// </summary>
public class EventManager : IDisposable
{
    private static readonly TimeSpan EventHandlingPeriod = TimeSpan.FromSeconds(5);

    private readonly Subject<EventMessage> subject = new Subject<EventMessage>();

    private readonly IDisposable subscription;

    private readonly object locker = new object();

    private readonly IEventService eventService;

    /// <summary>
    /// Initializes a new instance of the <see cref="EventManager"/> class.
    /// </summary>
    /// <param name="scheduler">The scheduler.</param>
    public EventManager(IEventService eventService, IScheduler scheduler)
    {
        this.eventService = eventService;
        this.subscription = this.CreateQuery(scheduler);
    }

    /// <summary>
    /// Pushes the event.
    /// </summary>
    /// <param name="eventMessage">The event message.</param>
    public void PushEvent(EventMessage eventMessage)
    {
        Contract.Requires(eventMessage != null);
        this.subject.OnNext(eventMessage);
    }

    /// <summary>
    /// Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.
    /// </summary>
    /// <filterpriority>2</filterpriority>
    public void Dispose()
    {
        this.Dispose(true);
    }

    private void Dispose(bool disposing)
    {
        if (disposing)
        {
            // Dispose unmanaged resources
        }

        this.subject.Dispose();
        this.subscription.Dispose();
    }

    private IDisposable CreateQuery(IScheduler scheduler)
    {
        var buffered = this.subject
            .DistinctUntilChanged(new EventComparer())
            .Buffer(EventHandlingPeriod, scheduler);

        var query = buffered
            .Subscribe(this.HandleEvents);
        return query;
    }

    private void HandleEvents(IList<EventMessage> eventMessages)
    {
        Contract.Requires(eventMessages != null);
        var events = eventMessages.Select(this.SelectEvent);
        this.eventService.Add(events);
    }

    private Event SelectEvent(EventMessage message)
    {
        return new Event { Description = "evaluated description" };
    }

    private class EventComparer : IEqualityComparer<EventMessage>
    {
        public bool Equals(EventMessage x, EventMessage y)
        {
            return x.NetworkAddress == y.NetworkAddress && x.EventCode == y.EventCode && x.Attribute == y.Attribute;
        }

        public int GetHashCode(EventMessage obj)
        {
            var s = string.Concat(obj.NetworkAddress + "_" + obj.EventCode + "_" + obj.Attribute);
            return s.GetHashCode();
        }
    }
}

public class EventMessage
{
    public string NetworkAddress { get; set; }

    public byte EventCode { get; set; }

    public byte Attribute { get; set; }

    // Other properties
}

And the test: 和测试:

public void PushEventTest()
    {
        const string Address1 = "A:2.1.1";
        const string Address2 = "A:2.1.2";

        var eventServiceMock = new Mock<IEventService>();

        var scheduler = new TestScheduler();
        var target = new EventManager(eventServiceMock.Object, scheduler);
        var eventMessageA1 = new EventMessage { NetworkAddress = Address1, EventCode = 1, Attribute = 4 };
        var eventMessageB1 = new EventMessage { NetworkAddress = Address2, EventCode = 1, Attribute = 5 };
        var eventMessageA2 = new EventMessage { NetworkAddress = Address1, EventCode = 1, Attribute = 4 };
        scheduler.Schedule(() => target.PushEvent(eventMessageA1));
        scheduler.Schedule(TimeSpan.FromSeconds(1), () => target.PushEvent(eventMessageB1));
        scheduler.Schedule(TimeSpan.FromSeconds(2), () => target.PushEvent(eventMessageA1));

        scheduler.AdvanceTo(TimeSpan.FromSeconds(6).Ticks);

        eventServiceMock.Verify(s => s.Add(It.Is<List<Event>>(list => list.Count == 2)), Times.Once());

        scheduler.Schedule(TimeSpan.FromSeconds(3), () => target.PushEvent(eventMessageB1));

        scheduler.AdvanceTo(TimeSpan.FromSeconds(11).Ticks);

        eventServiceMock.Verify(s => s.Add(It.Is<List<Event>>(list => list.Count == 1)), Times.Once());
    }

Additionally, I remark again that it is really important that the software could run for days without problems, handling thousands of messages. 此外,我再次评论说,该软件可以连续运行数天而不会出现问题,处理数千条消息,这一点非常重要。 To make it clear: the test doesn't pass with the current implementation. 明确地说:测试未通过当前实现。

I'm not sure if this does exactly what you'd like, but you may be to group the elements explicitly using the group keyword, and then to manipulate the various IObservable s separately before recombining them. 我不确定这是否确实符合您的要求,但是您可能要使用group关键字对元素进行显式group ,然后在重新组合各个IObservable之前分别对其进行操作。

Eg if we have class definitions such as 例如,如果我们有类定义,例如

class A
{
    public char Key { get; set; }
}

class X : A { }
...

and a Subject<A> 和一个Subject<A>

Subject<A> subject = new Subject<A>();

then we can write 然后我们可以写

var buffered =
    from a in subject
    group a by new { Type = a.GetType(), Key = a.Key } into g
    from buffer in g.Buffer(TimeSpan.FromMilliseconds(300))
    where buffer.Any()
    select new
    {
        Count = buffer.Count,
        Type = buffer.First().GetType().Name,
        Key = buffer.First().Key
    };

buffered.Do(Console.WriteLine).Subscribe();

We can test this with the data you provided: 我们可以使用您提供的数据对此进行测试:

subject.OnNext(new X { Key = 'a' }); 
Thread.Sleep(100);
subject.OnNext(new X { Key = 'b' }); 
Thread.Sleep(100);
subject.OnNext(new X { Key = 'a' }); 
Thread.Sleep(100);
...
subject.OnCompleted();

To get the output you provided: 要获得您提供的输出:

{ Count = 2, Type = X, Key = a }
{ Count = 1, Type = X, Key = b }
{ Count = 1, Type = Y, Key = b }
{ Count = 1, Type = Y, Key = c }
{ Count = 2, Type = Z, Key = a }
{ Count = 2, Type = Z, Key = c }
{ Count = 1, Type = Z, Key = b }

Not sure if this is exactly what you want, but it seems to support your use cases. 不知道这是否正是您想要的,但它似乎支持您的用例。

First, let's define the base class to use (you can easily modify this to suit your needs): 首先,让我们定义要使用的基类(您可以轻松地对其进行修改以满足您的需求):

public class MyEvent
{
    public string NetworkAddress { set; get; }
    public string EventCode { set; get; }
}

Let's set up your devices as an array of IObservable<MyEvent> - you may have these available differently, and the below would need to change to accommodate that of course. 让我们将您的设备设置为IObservable<MyEvent>的数组-您可能会以不同的方式使用这些设备,并且当然需要更改以下内容以适应这些情况。 These devices will each produce a value with a random delay between 0.5 and 1.5 seconds. 这些设备将各自产生一个具有0.5到1.5秒之间的随机延迟的值。

var deviceA = new MyEvent[] { new MyEvent() {NetworkAddress = "A", EventCode = "1"},
                              new MyEvent() {NetworkAddress = "A", EventCode = "1"},
                              new MyEvent() {NetworkAddress = "A", EventCode = "2"} };

var deviceB = new MyEvent[] { new MyEvent() {NetworkAddress = "B", EventCode = "1"},
                              new MyEvent() {NetworkAddress = "B", EventCode = "2"},
                              new MyEvent() {NetworkAddress = "B", EventCode = "2"},
                              new MyEvent() {NetworkAddress = "B", EventCode = "3"} };   

var random = new Random();                                 

var deviceARand = deviceA.ToObservable().Select(a => Observable.Return(a).Delay(TimeSpan.FromMilliseconds(random.Next(500,1500)))).Concat();
var deviceBRand = deviceB.ToObservable().Select(b => Observable.Return(b).Delay(TimeSpan.FromMilliseconds(random.Next(500,1500)))).Concat();

var devices = new IObservable<MyEvent>[] { deviceARand, deviceBRand };

Now let's take all of these individual device streams, make them 'distinct', and merge them into a single master stream: 现在,让我们采用所有这些单独的设备流,使其“与众不同”,然后将它们合并为一个主流:

var stream = devices.Aggregate(Observable.Empty<MyEvent>(), (acc, device) => acc.DistinctUntilChanged(a => a.EventCode).Merge(device));

Once you have that, getting this stream to be consumed periodically is just a matter of buffering it with Buffer : 一旦有了这些,就可以定期使用此流,而只需使用Buffer对其进行Buffer

stream.Buffer(TimeSpan.FromSeconds(1)).Subscribe(x => { /* code here works on a list of the filtered events per second */ });

After searches and experiments, I put together some code that produces the output that I expect: 经过搜索和实验后,我整理了一些代码以产生期望的输出:

static void Main(string[] args)
    {
        const string Address1 = "A:2.1.1";
        const string Address2 = "A:2.1.2";
        var comparer = new EventComparer();
        var eventMessageA1 = new EventMessage { NetworkAddress = Address1, EventCode = 1, Attribute = 4 };
        var eventMessageB1 = new EventMessage { NetworkAddress = Address2, EventCode = 1, Attribute = 5 };
        var eventMessageA2 = new EventMessage { NetworkAddress = Address1, EventCode = 1, Attribute = 5 };
        var list = new[] { eventMessageA1, eventMessageA1, eventMessageB1, eventMessageA2, eventMessageA1, eventMessageA1 };

        var queue = new BlockingCollection<EventMessage>();
        Observable.Interval(TimeSpan.FromSeconds(2)).Subscribe
            (
                l => list.ToList().ForEach(m =>
                {
                    Console.WriteLine("Producing {0} on thread {1}", m, Thread.CurrentThread.ManagedThreadId);
                    queue.Add(m);
                })
            );

        // subscribing
        queue.GetConsumingEnumerable()
            .ToObservable()
             .Buffer(TimeSpan.FromSeconds(5))
             .Subscribe(e =>
                 {
                     Console.WriteLine("Queue contains {0} items", queue.Count);
                     e.Distinct(comparer).ToList().ForEach(m =>
                  Console.WriteLine("{0} - Consuming: {1} (queue contains {2} items)", DateTime.UtcNow, m, queue.Count));
                 }
             );

        Console.WriteLine("Type enter to exit");
        Console.ReadLine();
    }

    public class EventComparer : IEqualityComparer<EventMessage>
    {
        public bool Equals(EventMessage x, EventMessage y)
        {
            var result = x.NetworkAddress == y.NetworkAddress && x.EventCode == y.EventCode && x.Attribute == y.Attribute;
            return result;
        }

        public int GetHashCode(EventMessage obj)
        {
            var s = string.Concat(obj.NetworkAddress + "_" + obj.EventCode + "_" + obj.Attribute);
            return s.GetHashCode();
        }
    }

    public class EventMessage
    {
        public string NetworkAddress { get; set; }

        public byte EventCode { get; set; }

        public byte Attribute { get; set; }

        public override string ToString()
        {
            const string Format = "{0} ({1}, {2})";
            var s = string.Format(Format, this.NetworkAddress, this.EventCode, this.Attribute);
            return s;
        }
    }

Anyway, monitoring the application, it seems that this causes a memory leak. 无论如何,监视应用程序似乎会导致内存泄漏。 My question is now: 我的问题是:

  • what is causing the memory leak? 是什么导致内存泄漏? [please see the update below] [请参阅下面的更新]
  • is this the best way to do it (if I put the distinct on the first observable, I don't get the other events on next buffers, but items in each buffer should be isolated from others)? 这是最好的方法吗(如果我将distinct放在第一个可观察的对象上,则不会在下一个缓冲区中获得其他事件,但是每个缓冲区中的项都应与其他缓冲区隔离)?
  • how can I write a test using the test scheduler? 如何使用测试计划程序编写测试?

UPDATE : 更新

it seems that the memory increment lasts only some minutes, then the value is stable. 似乎内存增加仅持续几分钟,然后该值稳定。 I will run a long test. 我将进行长时间的测试。 Of course, this would be an absolutely acceptable behavior. 当然,这将是绝对可接受的行为。

UPDATE 26.08.12 : 更新26.08.12

  • as I already mentioned in the previous update, the memory usage increases only (and slowly) for some minutes after the startup. 正如我在上一个更新中已经提到的,启动后几分钟内内存使用量仅(缓慢)增加。 After 8 hours the memory consumed was stable, with normal fluctuations in the range of few KB) 8小时后,消耗的内存保持稳定,正常波动范围为几KB)
  • this question is very similar to mine and the suggested Drain extension could apply well to my problem (still to be verified) 这个问题与我的问题非常相似,建议的Drain扩展名也可以很好地适用于我的问题(尚待验证)

Anyway, I think that my question is still open for unit tests using the test scheduler . 无论如何,我认为我的问题仍然对使用test Scheduler进行单元测试开放。

thanks Francesco 谢谢弗朗切斯科

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM