简体   繁体   中英

Stanford Parser out of memory

I am trying to run Stanford parser in Ubuntu using python code. My text file is of 500 Mb which i am trying to parse.I have a RAM of 32GB. I am increasing the JVM size, but i don't whether it is actually increasing or not because every-time i am getting this error. Please help me out

WARNING!! OUT OF MEMORY! THERE WAS NOT ENOUGH  ***
***  MEMORY TO RUN ALL PARSERS.  EITHER GIVE THE    ***
***  JVM MORE MEMORY, SET THE MAXIMUM SENTENCE      ***
***  LENGTH WITH -maxLength, OR PERHAPS YOU ARE     ***
***  HAPPY TO HAVE THE PARSER FALL BACK TO USING    ***
***  A SIMPLER PARSER FOR VERY LONG SENTENCES.      ***
Sentence has no parse using PCFG grammar (or no PCFG fallback).  Skipping...
Exception in thread "main" edu.stanford.nlp.parser.common.NoSuchParseException
    at edu.stanford.nlp.parser.lexparser.LexicalizedParserQuery.getBestParse(LexicalizedParserQuery.java:398)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParserQuery.getBestParse(LexicalizedParserQuery.java:370)
    at edu.stanford.nlp.parser.lexparser.ParseFiles.processResults(ParseFiles.java:271)
    at edu.stanford.nlp.parser.lexparser.ParseFiles.parseFiles(ParseFiles.java:215)
    at edu.stanford.nlp.parser.lexparser.ParseFiles.parseFiles(ParseFiles.java:74)
    at edu.stanford.nlp.parser.lexparser.LexicalizedParser.main(LexicalizedParser.java:1513)

You should divide the text file into small pieces and give them to the parser one at a time. Since the parser creates an in-memory representation for a whole "document" it is given at a time (which is orders of magnitude bigger than the document on disk), it is a very bad idea to try to give it a 500 MB document in one gulp.

You should also avoid super-long "sentences", which can easily occur if casual or web-scraped text lacks sentence delimiters, or you are feeding it big tables or gibberish. The safest way to avoid this issue is to set a parameter limiting the maximum sentence length, such as -maxLength 100 .

You might want to try out the neural network dependency parser, which scales better to large tasks: http://nlp.stanford.edu/software/nndep.shtml .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM