Elasticsearch date histogram aggregation on a duration of time -
the documents deal in elasticsearch have concept of duration represented start , end time, e.g.
{ issueid: 1, issuepriority: 3, timewindow: { start: "2015-10-14t17:00:00-07:00", end: "2015-10-14t18:00:00-07:00" } }, { issueid: 2, issuepriority: 1, timewindow: { start: "2015-10-14t16:50:00-07:00", end: "2015-10-14t17:50:00-07:00" } }
my goal produce histogram number of issues , max priority aggregated 15 minute buckets. example above issue #1
bucketized 17:00
, 17:15
, 17:30
, , 17:45
buckets, no more, no less.
i tried using date_histogram
aggregation, e.g:
aggs: { max_priority_over_time: { date_histogram: { field: "timewindow.start", interval: "15minute", }, aggs: { max_priority: ${top_hits_aggregation} } } }
but bucketizing issue #1
17:00
bucket. if take timewindow.end
account added 18:00
bucket. know how can accomplish using date_histogram
or other elasticsearch aggregations? potentially generating range of timestamps 15 minutes apart timewindow.start
timewindow.end
can bucketized correctly. thanks.
ok, since timestamps data truncated nearest 10 minutes, figured can use nested terms aggregation
instead:
aggs: { per_start_time: { terms: { field: "timewindow.start" }, aggs: { per_end_time: { terms: { field: "timewindow.end" }, aggs: { max_priority: ${top_hits_aggregation} } } } } }
this gives me nested bucket per start_time per end_time, e.g:
{ "key": 1444867800000, "key_as_string": "2015-10-15t00:10:00.000z", "doc_count": 11, "per_end_time": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": 1444871400000, "key_as_string": "2015-10-15t01:10:00.000z", "doc_count": 11, "max_priority": { "hits": { "total": 11, "max_score": 4, } } } ] } }
by trimming down buckets in our backend (ruby on rails), following results:
[ { "start_time": "2015-10-14 14:40:00 -0700", "end_time": "2015-10-14 15:40:00 -0700", "max_priority": 4, "count": 12 } ], [ { "start_time": "2015-10-14 14:50:00 -0700", "end_time": "2015-10-14 15:50:00 -0700", "max_priority": 4, "count": 12 } ], ...
which can map/reduced further date histogram arbitrary time buckets, outside of elasticsearch of course. if timewindow.start
, timewindow.end
, window duration arbitrary in time, guess it'd equivalent of fetching , doing counting in backend (since it's generating 1 nested time bucket per document), fortunately timestamps deal predictable can take hybrid approach.
Comments
Post a Comment