Tag Archives: order statistics

Top K or K-most frequent words in a document

Given a document (or stream) of words. Find the top k most frequent words in the document (or stream). For example, if stream = “aa bb cc bb bb cc dd dd ee ff ee dd aa ee”. That is, {“dd”=3, “ee”=3, “ff”=1, “aa”=2, “bb”=3, “cc”=2}. Then top 3 most frequent words are: {“dd”, “ee”, […]