r - Counting co-occurence of strings in lists of lists -
assume have list of lists follows:
$`1` [1] "john" [2] "maria" $`2` [1] "john" [2] "maria" $`3` [1] "john" [2] "carlos"
then trying figure out names have occurred together, in sublist. i.e. "john" , "maria" occurred twice, sublists names should score of 2, whereas "john" , "carlos" occurred once , should score of 1. expected out put be:
$`1` [1] 2 $`2` [1] 2 $`3` [1] 1
also, assume there unlimited number of names in each sublist. key identify instances 2 names occur more once, , give them additional "point" each time co-occur.
i first generate pairs of names in lists using lapply
combn
:
(pdat <- lapply(dat, function(x) { y <- combn(sort(x), 2) paste(y[1,], y[2,]) })) # [[1]] # [1] "john maria" # # [[2]] # [1] "john maria" # # [[3]] # [1] "carlos john"
then generate number of each pair table
, unlist
:
(tab <- table(unlist(pdat))) # carlos john john maria # 1 2
finally compute scores each element in list summing frequencies:
sapply(pdat, function(x) sum(tab[x])) # [1] 2 2 1
data:
(dat <- list(c("john", "maria"), c("john", "maria"), c("john", "carlos"))) # [[1]] # [1] "john" "maria" # # [[2]] # [1] "john" "maria" # # [[3]] # [1] "john" "carlos"
Comments
Post a Comment