r - Elastic : make a light count query (vs search query) -


i accessing bulk data in elastic through r. analytics purpose need query data relatively long duration (say month). data month approx 4.5 million rows , r goes out of memory.

sample data below (for 1 day):

dt <- as.date("2015-09-01", "%y-%m-%d") frmdt <- strftime(dt,"%y-%m-%d")   todt <- as.date(dt+1)   todt <- strftime(todt,"%y-%m-%d")    connect(es_base="http://xx.yy.zzz.kk")   start_date <- as.integer(as.posixct(frmdt))*1000   end_date <- as.integer(as.posixct(todt))*1000     query <- sprintf('{"query":{"range":{"time":{"gte":"%s","lte":"%s"}}}}',start_date,end_date)   s_list <- elastic::search(index = "organised_2015_09",type = "property_search", body=query ,                      fields = c("trackid", "time"), size=1000000)$hits$hits   length(s_list) [1] 144612 

this result 1 day has 144k records , 222 mb. sample list item below:

> s_list[[1]] $`_index` [1] "organised_2015_09"  $`_type` [1] "property_search"  $`_id` [1] "1441122918941"  $`_version` [1] 1  $`_score` [1] 1  $fields $fields$time $fields$time[[1]] [1] 1441122918941   $fields$trackid $fields$trackid[[1]] [1] "fd4b4ce88101e58623ba9e6e31971d1f" 

actually summary count of number of items "trackid" , "time" (summarize every day) suffice analytics purpose. hence tried transform count query aggregations. constructed below query:

query < -'{"size" : 0, "query": {     "filtered": {         "query": {             "match_all": {}         },         "filter": {             "range": {                 "time": {                     "gte": 1441045800000,                     "lte": 1443551400000                 }             }         }     } }, "aggs": {     "articles_over_time": {         "date_histogram": {             "field": "time",             "interval": "day",             "time_zone": "+05:30"         },         "aggs": {             "group_by_state": {                 "terms": {                     "field": "trackid",                     "size": 0                 }             }         }     } } }'  response <- elastic::search(index="organised_recent",type="property_search",body=query, search_type="count") 

however did not gain in speed or document size. think missing not sure what.


Comments

Popular posts from this blog

javascript - Chart.js (Radar Chart) different scaleLineColor for each scaleLine -

apache - Error with PHP mail(): Multiple or malformed newlines found in additional_header -

java - Android – MapFragment overlay button shadow, just like MyLocation button -