简体   繁体   中英

How best create parent child relationship in Elasticsearch

I have two real time streams. One contains news articles and the other comments about the same articles. I'd like to create a parent-child relationship between each article and that articles comments except for headline. There is no common id. I'd like to use the headline which exists in both streams and match the two streams based on that every 15 minutes. I am assuming that 15 min would be sufficient to handle delay between the two streams. How would you go about doing this? Any ideas would be appreciated.

A typical message containing, entity_name, source_name, headline, which comes through Logstash looks like this:

"Thomson Reuters Corp.","Japan Today","Trump claims victory after forcing NATO crisis talks"

Some typical comments, comment, headline, which comes through Logstash but a separate pipeline looks like this:

"We applaud Trumps claim ...", "Trump claims victory after forcing NATO crisis talks"

"Nato crisis is important...", "Trump claims victory after forcing NATO crisis talks"

Specifically: 1. Keep indexes separate or create a third index with from the first two? 2. How to run 15 min refresh cycles? 3. If there is a better way/tool/data store, please advise.

You can create a common id between comments and article by hashing the headline (supposing you never observe typos).

  1. Yes, keep articles and comments in separate indices. reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html

  2. Need more specifics on what you mean by matching the streams. Not sure if there's a way to schedule jobs using Elasticsearch Task API... Maybe make a cronjob to do this? You can go through the articles index, hash the headline, and then query for that hash in the comments index.

  3. Seems like you have a solid storage method right now.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM