Elasticsearch date histogram aggregation on a duration of time -

- March 15, 2014

the documents deal in elasticsearch have concept of duration represented start , end time, e.g.

{   issueid: 1,   issuepriority: 3,   timewindow: {     start: "2015-10-14t17:00:00-07:00",     end: "2015-10-14t18:00:00-07:00"   } }, {   issueid: 2,   issuepriority: 1,   timewindow: {     start: "2015-10-14t16:50:00-07:00",     end: "2015-10-14t17:50:00-07:00"   } }

my goal produce histogram number of issues , max priority aggregated 15 minute buckets. example above issue #1 bucketized 17:00, 17:15, 17:30, , 17:45 buckets, no more, no less.

i tried using date_histogram aggregation, e.g:

aggs: {   max_priority_over_time: {     date_histogram: {       field: "timewindow.start",       interval: "15minute",     },     aggs: {       max_priority: ${top_hits_aggregation}     }   } }

but bucketizing issue #1 17:00 bucket. if take timewindow.end account added 18:00 bucket. know how can accomplish using date_histogram or other elasticsearch aggregations? potentially generating range of timestamps 15 minutes apart timewindow.start timewindow.end can bucketized correctly. thanks.

ok, since timestamps data truncated nearest 10 minutes, figured can use nested terms aggregation instead:

aggs: {   per_start_time: {     terms: {       field: "timewindow.start"     },     aggs: {       per_end_time: {         terms: {           field: "timewindow.end"         },         aggs: {           max_priority: ${top_hits_aggregation}         }       }     }   } }

this gives me nested bucket per start_time per end_time, e.g:

{   "key": 1444867800000,   "key_as_string": "2015-10-15t00:10:00.000z",   "doc_count": 11,   "per_end_time": {     "doc_count_error_upper_bound": 0,     "sum_other_doc_count": 0,     "buckets": [       {         "key": 1444871400000,         "key_as_string": "2015-10-15t01:10:00.000z",         "doc_count": 11,         "max_priority": {           "hits": {             "total": 11,             "max_score": 4,           }         }       }     ]   } }

by trimming down buckets in our backend (ruby on rails), following results:

[   {     "start_time": "2015-10-14 14:40:00 -0700",     "end_time": "2015-10-14 15:40:00 -0700",     "max_priority": 4,     "count": 12   } ], [   {     "start_time": "2015-10-14 14:50:00 -0700",     "end_time": "2015-10-14 15:50:00 -0700",     "max_priority": 4,     "count": 12   } ], ...

which can map/reduced further date histogram arbitrary time buckets, outside of elasticsearch of course. if timewindow.start, timewindow.end , window duration arbitrary in time, guess it'd equivalent of fetching , doing counting in backend (since it's generating 1 nested time bucket per document), fortunately timestamps deal predictable can take hybrid approach.

Search This Blog

WIKI

Elasticsearch date histogram aggregation on a duration of time -

Comments

Post a Comment

Popular posts from this blog

jquery - ReferenceError: CKEDITOR is not defined -

apache - Error with PHP mail(): Multiple or malformed newlines found in additional_header -

javascript - Chart.js (Radar Chart) different scaleLineColor for each scaleLine -