Java Stanford NLP: Find word frequency?

Question

I'm using the Stanford NLP Parsing toolkit. Given a word in the lexicon, how can I find its frequency*? Or, given a frequency rank, how can I determine the corresponding word?

*in the entire language, not just the text sample.

This is a demo of the toolkit I'm using:

class ParserDemo {
  public static void main(String[] args) {
    LexicalizedParser lp = new LexicalizedParser("englishPCFG.ser.gz");
    lp.setOptionFlags(new String[]{"-maxLength", "80", "-retainTmpSubcategories"});

    String[] sent = { "Sincerity", "may", "frighten", "the", "boy", "." };
    Tree parse = (Tree) lp.apply(Arrays.asList(sent));
    parse.pennPrint();
    System.out.println();

    TreebankLanguagePack tlp = new PennTreebankLanguagePack();
    GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
    GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
    Collection tdl = gs.typedDependenciesCollapsed();
    System.out.println(tdl);
    System.out.println();

    TreePrint tp = new TreePrint("penn,typedDependenciesCollapsed");
    tp.printTree(parse);
  }

}

Answer 1

If you are only counting word frequencies, sentence parsing is unnecessary. All you need to do is tokenise the input and then count word frequencies using a java HashMap . If you want to use the Stanford tools, then use any of the tokenisers in edu.stanford.nlp.process .

This gives you the frequency of any given word, but in general it may not be possible to find the word corresponding to a given frequency rank, since some words may be equally frequent in the document.

Answer 2

This is an IR (information retrieval) problem more than NLP. One should look at libraries like Lucene for this task.

Java Stanford NLP: Find word frequency?

Question

2 answers

solution1
1 2009-12-01 11:42:09

solution2
0 2014-02-27 23:11:32

Java Stanford NLP: Find word frequency?

Question

2 answers

solution1 1 2009-12-01 11:42:09

solution2 0 2014-02-27 23:11:32

solution1
1 2009-12-01 11:42:09

solution2
0 2014-02-27 23:11:32