R time aggregate with start/stop -


i have set of time series data has start , stop time. each event can last few seconds few days, need calculate sum, in example total memory used, every hour of jobs active @ time. here sample of data:

mem_used    start_time            stop_time  16          2015-10-24 17:24:41   2015-10-25 04:19:44    80          2015-10-24 17:24:51   2015-10-25 03:14:59    44          2015-10-24 17:25:27   2015-10-25 01:16:10    28          2015-10-24 17:25:43   2015-10-25 00:00:31    72          2015-10-24 17:30:23   2015-10-24 23:58:31    

in case should give like:

time                  total_mem 2015-10-24 17:00:00   240 2015-10-24 18:00:00   240 ... 2015-10-25 00:00:00   168 2015-10-25 01:00:00   140 2015-10-25 02:00:00   96 2015-10-25 03:00:00   96 2015-10-25 04:00:00   16 

i'm trying aggregate function can not figure out. ideas? thanks.

here solution based on dplyr , lubridate. make sure first have data in right format (e.g date in posixct)

library(dplyr) library(lubridate)  glimpse(df) ## observations: 5 ## variables: 3 ## $ mem_used   (int) 16, 80, 44, 28, 72 ## $ start_time (time) 2015-10-24 17:24:41, 2015-10-24 17:24:51... ## $ end_time   (time) 2015-10-25 04:19:44, 2015-10-25 03:14:59... 

then keep hour (removing minutes , seconds) since want aggregate per hour.

### remove minutes , seconds minute(df$start_time) <- 0 second(df$start_time) <- 0 minute(df$end_time) <- 0 second(df$end_time) <- 0 

the important step now, create new data.frame 1 row each hour between start_time , end_time. example, if on first line of original data.frame have 5 hours between start_time , end_time, end 5 rows , value mem_used duplicated 5 times.

### n <- nrow(df) l <- lapply(1:n, function(i) {   date <- seq.posixt(df$start_time[i], df$end_time[i], = "hour")   mem_used <- rep(df$mem_used[i], length(date))   data.frame(time = date, mem_used = mem_used) })   df <- reduce(rbind, l) glimpse(df) ## observations: 47 ## variables: 2 ## $ time     (time) 2015-10-24 17:00:00, 2015-10-24 18:00:00, ... ## $ mem_used (int) 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,... 

finally, can aggregate using dplyr or aggregate (or other similar functions)

df %>%   group_by(time) %>%   summarise(tot = sum(mem_used)) ##                   time   tot ##                 (time) (int) ## 1  2015-10-24 17:00:00   240 ## 2  2015-10-24 18:00:00   240 ## 3  2015-10-24 19:00:00   240 ## 4  2015-10-24 20:00:00   240 ## 5  2015-10-24 21:00:00   240 ## 6  2015-10-24 22:00:00   240 ## 7  2015-10-24 23:00:00   240 ## 8  2015-10-25 00:00:00   168 ## 9  2015-10-25 01:00:00   140 ## 10 2015-10-25 02:00:00    96 ## 11 2015-10-25 03:00:00    96 ## 12 2015-10-25 04:00:00    16  ## or aggregate aggregate(mem_used ~ time, fun = sum, data = df) 

Comments

Popular posts from this blog

javascript - Chart.js (Radar Chart) different scaleLineColor for each scaleLine -

apache - Error with PHP mail(): Multiple or malformed newlines found in additional_header -

java - Android – MapFragment overlay button shadow, just like MyLocation button -