r - Counting co-occurence of strings in lists of lists -


assume have list of lists follows:

$`1` [1] "john" [2] "maria"  $`2` [1] "john" [2] "maria"  $`3` [1] "john" [2] "carlos" 

then trying figure out names have occurred together, in sublist. i.e. "john" , "maria" occurred twice, sublists names should score of 2, whereas "john" , "carlos" occurred once , should score of 1. expected out put be:

$`1` [1] 2  $`2` [1] 2  $`3` [1] 1 

also, assume there unlimited number of names in each sublist. key identify instances 2 names occur more once, , give them additional "point" each time co-occur.

i first generate pairs of names in lists using lapply combn:

(pdat <- lapply(dat, function(x) {   y <- combn(sort(x), 2)   paste(y[1,], y[2,]) })) # [[1]] # [1] "john maria" #  # [[2]] # [1] "john maria" #  # [[3]] # [1] "carlos john" 

then generate number of each pair table , unlist:

(tab <- table(unlist(pdat))) # carlos john  john maria  #           1           2  

finally compute scores each element in list summing frequencies:

sapply(pdat, function(x) sum(tab[x])) # [1] 2 2 1 

data:

(dat <- list(c("john", "maria"), c("john", "maria"), c("john", "carlos"))) # [[1]] # [1] "john"  "maria" #  # [[2]] # [1] "john"  "maria" #  # [[3]] # [1] "john"   "carlos" 

Comments

Popular posts from this blog

javascript - Chart.js (Radar Chart) different scaleLineColor for each scaleLine -

apache - Error with PHP mail(): Multiple or malformed newlines found in additional_header -

java - Android – MapFragment overlay button shadow, just like MyLocation button -