r - Counting co-occurence of strings in lists of lists -

- January 15, 2011

assume have list of lists follows:

$`1` [1] "john" [2] "maria"  $`2` [1] "john" [2] "maria"  $`3` [1] "john" [2] "carlos"

then trying figure out names have occurred together, in sublist. i.e. "john" , "maria" occurred twice, sublists names should score of 2, whereas "john" , "carlos" occurred once , should score of 1. expected out put be:

$`1` [1] 2  $`2` [1] 2  $`3` [1] 1

also, assume there unlimited number of names in each sublist. key identify instances 2 names occur more once, , give them additional "point" each time co-occur.

i first generate pairs of names in lists using lapply combn:

(pdat <- lapply(dat, function(x) {   y <- combn(sort(x), 2)   paste(y[1,], y[2,]) })) # [[1]] # [1] "john maria" #  # [[2]] # [1] "john maria" #  # [[3]] # [1] "carlos john"

then generate number of each pair table , unlist:

(tab <- table(unlist(pdat))) # carlos john  john maria  #           1           2

finally compute scores each element in list summing frequencies:

sapply(pdat, function(x) sum(tab[x])) # [1] 2 2 1

data:

(dat <- list(c("john", "maria"), c("john", "maria"), c("john", "carlos"))) # [[1]] # [1] "john"  "maria" #  # [[2]] # [1] "john"  "maria" #  # [[3]] # [1] "john"   "carlos"

Search This Blog

WIKI

r - Counting co-occurence of strings in lists of lists -

Comments

Post a Comment

Popular posts from this blog

jquery - ReferenceError: CKEDITOR is not defined -

javascript - Chart.js (Radar Chart) different scaleLineColor for each scaleLine -

apache - Error with PHP mail(): Multiple or malformed newlines found in additional_header -