简体   繁体   中英

With ElasticSearch (C#/NEST) get all documents sorted by timestamp

I have an elasticsearch database where I have a type and timestamp field defined as:

public Common.MediaType Type        { get; set; }
public DateTime Timestamp           { get; set; }

How can I do a query to return X entries, matching a specific type and sorted by timestamp?

If I do this:

        var Match = Index.Driver.Search<Metadata>(_ => _
            .Query(Q => Q.Term(P => P.Type, Type))
            .Size(NumberOfItems)
            .Sort(Q => Q.Descending(P => P.Timestamp)));

It will fail because it will return NumberOfItems records from the right type, and then sort them by timestamp, so it can very well miss the records with the newest timestamps.

What I want to is to make the timestamp part of the query so that I get NumberOfItems records sorted by TimeStamp that match the right type.

But I don't know how to write this...

Just so I understand correctly, the top 5 Metadata document types with the newest timestamps across all indices within a cluster?

If that's what you'd like then the following will work

var Type = "metadata-type";
var NumberOfItems = 5;

var searchResponse = client.Search<Metadata>(s => s
    .AllIndices()
    .Query(q => q
        .Term(f => f.Type, Type)
    )
    .Size(NumberOfItems)
    .Sort(sort => sort
        .Descending(f => f.Timestamp)
    )
);

AllIndices() will search for Metadata document types across all indices. If you want to search only within one specific index, or multiple specific indices, then you can replace AllIndices() with

.Index("index-1,index-2")

where the argument is a comma separated list of the index names you want to search across.

Something that you need to consider is that there is an interval between a document indexed into Elasticsearch and that document being available to search results. By default this interval is 1 second and that is the minimum amount of time that it can be set to.

EDIT:

From your comment:

currently, If all records are of the right type and I want 5 items, it will take the first 5 and sort them; but maybe record 6 had a higher timestamp value but it will not be included in the result of the query matching the type.

How will Elasticsearch take the first 5 items ? It will order the items first according to some value, and in this case, it will order according to timestamp descending as specified in the query, then return the top 5 documents. It won't imply some arbitrary ordering of documents, take the first 5 and then order only these 5 by timestamp descending.

Here's an example to demonstrate

void Main()
{
    var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
    var defaultIndex = "default-index";
    var connectionSettings = new ConnectionSettings(pool)
            .DefaultIndex(defaultIndex);

    var client = new ElasticClient(connectionSettings);

    if (client.IndexExists(defaultIndex).Exists)
        client.DeleteIndex(defaultIndex);

    client.CreateIndex(defaultIndex, c => c
        .Mappings(m => m
            .Map<Metadata>(mm => mm.AutoMap())
        )
    );

    var metadata = Enumerable.Range(1, 1000).Select(i =>
        new Metadata
        {
            Id = i,
            Timestamp = DateTime.UtcNow.Date.AddDays(-(i-1)),
            Type = i % 2 == 0? "metadata-type-2" : "metadata-type-1"
        });

    client.IndexMany(metadata);
    client.Refresh(defaultIndex);

    var Type = "metadata-type-1";
    var NumberOfItems = 5;

    var searchResponse = client.Search<Metadata>(s => s
        .Query(q => q
            .Term(f => f.Type, Type)
        )
        .Size(NumberOfItems)
        .Sort(sort => sort
            .Descending(f => f.Timestamp)
        )
    );
}

public class Metadata
{
    public int Id { get; set;}

    public DateTime Timestamp { get; set;}

    [String(Index = FieldIndexOption.NotAnalyzed)]
    public string Type { get; set;}
}

For the search response, we get back

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 500,
    "max_score" : null,
    "hits" : [ {
      "_index" : "default-index",
      "_type" : "metadata",
      "_id" : "1",
      "_score" : null,
      "_source" : {
        "id" : 1,
        "timestamp" : "2016-07-01T00:00:00Z",
        "type" : "metadata-type-1"
      },
      "sort" : [ 1467331200000 ]
    }, {
      "_index" : "default-index",
      "_type" : "metadata",
      "_id" : "3",
      "_score" : null,
      "_source" : {
        "id" : 3,
        "timestamp" : "2016-06-29T00:00:00Z",
        "type" : "metadata-type-1"
      },
      "sort" : [ 1467158400000 ]
    }, {
      "_index" : "default-index",
      "_type" : "metadata",
      "_id" : "5",
      "_score" : null,
      "_source" : {
        "id" : 5,
        "timestamp" : "2016-06-27T00:00:00Z",
        "type" : "metadata-type-1"
      },
      "sort" : [ 1466985600000 ]
    }, {
      "_index" : "default-index",
      "_type" : "metadata",
      "_id" : "7",
      "_score" : null,
      "_source" : {
        "id" : 7,
        "timestamp" : "2016-06-25T00:00:00Z",
        "type" : "metadata-type-1"
      },
      "sort" : [ 1466812800000 ]
    }, {
      "_index" : "default-index",
      "_type" : "metadata",
      "_id" : "9",
      "_score" : null,
      "_source" : {
        "id" : 9,
        "timestamp" : "2016-06-23T00:00:00Z",
        "type" : "metadata-type-1"
      },
      "sort" : [ 1466640000000 ]
    } ]
  }
}

We get back documents with ids 1 , 3 , 5 , 7 , and 9 in that order as these are the top 5 documents with the latest Timestamp value that match the query term "metadata-type-1" .

You probably want to use Take instead of Size.

var Match = Index.Driver.Search<Metadata>(_ => _
            .Query(Q => Q.Term(P => P.Type, Type))
            .Take(NumberOfItems)
            .Sort(Q => Q.Descending(P => P.Timestamp)));

This should sort first and then take first NumberOfItems documents.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM