简体   繁体   中英

Instantiating and using StanfordTagger within NLTK

I apologize for the newbie-nature of this question - I have been trying to figure out Python packaging and namespaces, but the finer points seem to elude me. To wit, I would like to use the Python wrapper to Stanford part-of-speech tagger. I had no trouble finding the documentation here , which provides a use sample:

st = StanfordTagger('bidirectional-distsim-wsj-0-18.tagger')
st.tag('What is the airspeed of an unladen swallow ?'.split())
    [('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'), ('airspeed', 'NN'), ('of', 'IN'), ('an', 'DT'), ('unladen', 'JJ'), ('swallow', 'VB'), ('?', '.')]

This looks great, but I can't seem to get the right namespaces to show up in my local Python + NLTK installation (I have the latest NLTK version, and have tried the below in Python 2.6.x as well as 2.7.x):

>>> import nltk
>>> from nltk import *
>>> from nltk.tag import stanford 
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name stanford

I also tried this import statement, with same result:

>>> from nltk.tag.stanford import StanfordTagger
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named stanford

Searching around here on SO, I found this question , where the poster seems to be experiencing the exact same problem, but is able to get past the namespace step with:

The problem is that my nltk lib doesnt contain the stanford module. So I copied the same into the appropriate folder and compiled the same.

Sounds like it is indeed the same issue, except I can't for the life of me find any documentation for how to add modules to NLTK. Everything I read on NLTK web site implies that the Stanford module should already be packaged into the base install. So, a question in two parts:

  1. (Specific) Any suggestions for getting past this particular issue and starting to use StanfordTagger from Python? I know I can easily call the jar directly and then interpret the output in Python - that's all the Python wrapper does anyway - but I would like to get this to work out of principle, if nothing else.
  2. (General) What is a good pythonic approach to investigating missing packaging issues or dependencies such as above?

Suggestions: a. Look on the nltk directory installed on your PC. I checked mine and stanford.py is not there (ie is missing in nltk/tag/ directory). You can find quickly where to look for running this:

import distutils.sysconfig
print distutils.sysconfig.get_python_lib()+'/nltk/tag/'

b. If it's not there, then copy the stanford.py file from the source you mentioned to the nltk/tag directory on your PC (which you get in step a).

I hope it works out.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM