java - Tokenizing and Indexing many files -


i have read several files , and index each word in files. while indexing have follow format:

requirement ==> word , {d1,tf1,d2,tf2,d4,tf4} , someothervalue

explanation :

         1)word = word in files           2)d1,d2,d4... fileid           3) tf1,tf2,tf4....are number of times word appears             in d1,d2,d4 respectievly 

i created class "token" contains words different files 'string token' , name of file belongs 'string fileid' , frequency in file 'int count'.

i can check various words in 1 file , update count. used arraylist so. when same word appears in file how can append fileid , count while indexing.

i create a

class refcount {     string fileid;     int count;     refcount( fileid ){         this.fileid = fileid;         count = 1;     }     void increment(){         count++;     }     // more... } 

and class token should be

class token {     string word;     list<refcount> references;     ...      public void countword( string fileid ){         int last = references.size() - 1;         if( last >= 0 ){             refcount rc =  references.get(last);             if( equals(fileid) ){                 rc.increment();                 return;             }         }         references.add( fileid );     }     // more... } 

this assumes adding references file file last file id needs checked determine whether still in same file.

you should use map<string,token> rather list.

edit display results can iterate map or list of tokens, list of refcount objects:

for( token token: tokenlist ){     system.out.print( token.getword() + ":" );     for( refcount refcount: token.getreferences() ){         system.out.print( " " + refcount.getfileid() +                           "*" + refcount.getcount() );     }     system.out.println(); } 

you may want terminate line after every n-th id/count pair.


Comments

Popular posts from this blog

javascript - Chart.js (Radar Chart) different scaleLineColor for each scaleLine -

apache - Error with PHP mail(): Multiple or malformed newlines found in additional_header -

java - Android – MapFragment overlay button shadow, just like MyLocation button -