R time aggregate with start/stop -
i have set of time series data has start , stop time. each event can last few seconds few days, need calculate sum, in example total memory used, every hour of jobs active @ time. here sample of data:
mem_used start_time stop_time 16 2015-10-24 17:24:41 2015-10-25 04:19:44 80 2015-10-24 17:24:51 2015-10-25 03:14:59 44 2015-10-24 17:25:27 2015-10-25 01:16:10 28 2015-10-24 17:25:43 2015-10-25 00:00:31 72 2015-10-24 17:30:23 2015-10-24 23:58:31
in case should give like:
time total_mem 2015-10-24 17:00:00 240 2015-10-24 18:00:00 240 ... 2015-10-25 00:00:00 168 2015-10-25 01:00:00 140 2015-10-25 02:00:00 96 2015-10-25 03:00:00 96 2015-10-25 04:00:00 16
i'm trying aggregate function can not figure out. ideas? thanks.
here solution based on dplyr
, lubridate
. make sure first have data in right format (e.g date in posixct
)
library(dplyr) library(lubridate) glimpse(df) ## observations: 5 ## variables: 3 ## $ mem_used (int) 16, 80, 44, 28, 72 ## $ start_time (time) 2015-10-24 17:24:41, 2015-10-24 17:24:51... ## $ end_time (time) 2015-10-25 04:19:44, 2015-10-25 03:14:59...
then keep hour (removing minutes , seconds) since want aggregate per hour.
### remove minutes , seconds minute(df$start_time) <- 0 second(df$start_time) <- 0 minute(df$end_time) <- 0 second(df$end_time) <- 0
the important step now, create new data.frame
1 row each hour between start_time
, end_time
. example, if on first line of original data.frame
have 5 hours between start_time
, end_time
, end 5 rows , value mem_used
duplicated 5 times.
### n <- nrow(df) l <- lapply(1:n, function(i) { date <- seq.posixt(df$start_time[i], df$end_time[i], = "hour") mem_used <- rep(df$mem_used[i], length(date)) data.frame(time = date, mem_used = mem_used) }) df <- reduce(rbind, l) glimpse(df) ## observations: 47 ## variables: 2 ## $ time (time) 2015-10-24 17:00:00, 2015-10-24 18:00:00, ... ## $ mem_used (int) 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,...
finally, can aggregate using dplyr
or aggregate
(or other similar functions)
df %>% group_by(time) %>% summarise(tot = sum(mem_used)) ## time tot ## (time) (int) ## 1 2015-10-24 17:00:00 240 ## 2 2015-10-24 18:00:00 240 ## 3 2015-10-24 19:00:00 240 ## 4 2015-10-24 20:00:00 240 ## 5 2015-10-24 21:00:00 240 ## 6 2015-10-24 22:00:00 240 ## 7 2015-10-24 23:00:00 240 ## 8 2015-10-25 00:00:00 168 ## 9 2015-10-25 01:00:00 140 ## 10 2015-10-25 02:00:00 96 ## 11 2015-10-25 03:00:00 96 ## 12 2015-10-25 04:00:00 16 ## or aggregate aggregate(mem_used ~ time, fun = sum, data = df)
Comments
Post a Comment