简体   繁体   中英

How do I use Stanford Parser's Typed Dependencies in Python?

To see an example of Typed Dependencies, check out the end of the output from this online example .

When I run the stanford parser on the commandline using lexparser.sh, it outputs the tree and the typed dependencies.

But when I run it using nltk.parse.stanford, all I get is the tree, with no typed dependencies. I can modify it to return the dependencies by setting -outputFormat="penn,typedDependencies" as documented here , though I'd just get the text. I wonder if somebody else has already done the work to process this into a more useful form.

The Stanford CoreNLP website lists several extensions for Python , though most of them seem like related forks. From glancing at the source code, this one looks promising for dealing with dependencies , though it is totally undocumented and I'm not sure how to use it.

Many of these libraries offer to run as a service and communicate via HTTP. I wonder if that would be faster than the way NLTK interacts with the parser, since it might not require a new JVM to start up repeatedly.

I'm not quite sure what the difference between CoreNLP and Stanford Parser are.

I also found this , though it uses JPype and I wasn't able to get that to compile.

I recently did a project that relied heavily on CoreNLP and the Stanford Parser. To start, if you're going to use it I highly recommend writing your code in Java, as using it with Python is a giant pain. However, I did manage to get it to work.

I recommend using this to talk to CoreNLP, it worked the best for me. This will require stating up the JVM and communicating with it locally (although it does this for you). It also has the lovely error of sometimes either returning the previous parse instead of the one just sent, or not returning at all. We used a decorator that would restart the parse after a set amount of time, which can be found here .

I wish you the best of luck with this, as it was quite the task. Also note that the NTLK Stanford parser is incomplete compared to the full CoreNLP. You shouldn't need NTLK to use CoreNLP, which will provide essentially everything you need from NER to POS to dependencies.

Just gave an answer to another question that suits this one better :)

I've been parsing the output of the CoreNLP using minidom. Here is some starter code you may want to use but you may want to check https://github.com/dasmith/stanford-corenlp-python

Note that you need to get the tokenization used by the Stanford CoreNLP since the data returned is based on offsets of sentences and tokens.

from xml.dom import minidom    
xmldoc = minidom.parseString(raw_xml_data)
for sentence_xml in xmldoc.getElementsByTagName('sentences')[0].getElementsByTagName('sentence'):
    parse = parser.parse(sentence_xml.getElementsByTagName('parse')[0].firstChild.nodeValue)
    tokens = [(i,j) for i,j in zip(sentence_xml.getElementsByTagName('tokens')[0].getElementsByTagName('token'),parse.get_leaves())]
    # example for processing dependencies
    elements = sentence_xml.getElementsByTagName('dependencies')
    for element in elements:
        if element.getAttribute('type')=="collapsed-ccprocessed-dependencies":
            dependencies += [i for i in element.getElementsByTagName('dep')]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM