如何在 Python 上使用 Wordnet 3.1 和 NLTK？

Question

我的研究工作需要使用 Wordnet 3.1，但 NLTK (python) 附帶默認的 wordnet 版本：3.0。 我使用最新版本的 Wordnet 很重要。

>>> from nltk.corpus import wordnet
>>> wordnet.get_version()
'3.0'

但是，由於 NLTK 3.1 是最新版本，我找不到任何方法來使用nltk.download()下載和訪問它，我正在尋找一種解決方法。

正如 Wordnet 網站（此處為當前版本鏈接）中所寫，我在下面引用：

僅限 WordNet 3.1 數據庫文件

您可以下載 WordNet 3.1 數據庫文件。 請注意，這不是完整的 package，也不包含任何運行 WordNet 的代碼。 但是，您可以用這些文件替換 3.0 本地安裝的數據庫目錄中的文件，然后 WordNet 界面將運行，從 3.1 數據庫返回條目。 這只是 WordNet 3.1 數據庫文件的壓縮 tar 文件。

我嘗試下載 Wordnet 3.1 數據庫文件並將它們替換為C:\Users\<username>\AppData\Roaming\nltk_data\corpora （在 Windows 系統上）的默認 Wordnet 文件。 我懷疑它不會起作用，因為說明是在 Wordnet 軟件安裝中替換數據庫文件，但我仍然嘗試過。

在運行wordnet.get_version()時，我收到以下錯誤。

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-2-d64ae1e68b36> in <module>
----> 1 wordnet.get_version()

~\anaconda3\lib\site-packages\nltk\corpus\util.py in __getattr__(self, attr)
    118             raise AttributeError("LazyCorpusLoader object has no attribute '__bases__'")
    119 
--> 120         self.__load()
    121         # This looks circular, but its not, since __load() changes our
    122         # __class__ to something new:

~\anaconda3\lib\site-packages\nltk\corpus\util.py in __load(self)
     86 
     87         # Load the corpus.
---> 88         corpus = self.__reader_cls(root, *self.__args, **self.__kwargs)
     89 
     90         # This is where the magic happens!  Transform ourselves into

~\anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py in __init__(self, root, omw_reader)
   1136 
   1137         # Load the lexnames
-> 1138         for i, line in enumerate(self.open("lexnames")):
   1139             index, lexname, _ = line.split()
   1140             assert int(index) == i

~\anaconda3\lib\site-packages\nltk\corpus\reader\api.py in open(self, file)
    206         """
    207         encoding = self.encoding(file)
--> 208         stream = self._root.join(file).open(encoding)
    209         return stream
    210 

~\anaconda3\lib\site-packages\nltk\data.py in join(self, fileid)
    335     def join(self, fileid):
    336         _path = os.path.join(self._path, fileid)
--> 337         return FileSystemPathPointer(_path)
    338 
    339     def __repr__(self):

~\anaconda3\lib\site-packages\nltk\compat.py in _decorator(*args, **kwargs)
     39     def _decorator(*args, **kwargs):
     40         args = (args[0], add_py3_data(args[1])) + args[2:]
---> 41         return init_func(*args, **kwargs)
     42 
     43     return wraps(init_func)(_decorator)

~\anaconda3\lib\site-packages\nltk\data.py in __init__(self, _path)
    313         _path = os.path.abspath(_path)
    314         if not os.path.exists(_path):
--> 315             raise IOError("No such file or directory: %r" % _path)
    316         self._path = _path
    317 

OSError: No such file or directory: 'C:\\Users\\Punit Singh\\AppData\\Roaming\\nltk_data\\corpora\\wordnet\\lexnames'

然后我檢查了文件結構，並在下面列出了之前和之后的樹。

Wordnet 3.0 中的文件樹

wordnet
├── adj.exc
├── adv.exc
├── citation.bib
├── cntlist.rev
├── data.adj
├── data.adv
├── data.noun
├── data.verb
├── index.adj
├── index.adv
├── index.noun
├── index.sense
├── index.verb
├── lexnames
├── LICENSE
├── noun.exc
├── README
├── verb.exc

Wordnet 3.1 中的文件樹

wordnet
├── adj.exc
├── adv.exc
├── cntlist
├── cntlist.rev
├── cousin.exc
├── data.adj
├── data.adv
├── data.noun
├── data.verb
├── index.adj
├── index.adv
├── index.noun
├── index.sense
├── index.verb
├── log.grind.3.1
├── noun.exc
├── sentidx.vrb
├── dbfiles
    ├── adj.all
    ├── adj.pert
    ├── adj.ppl
    ├── adv.all
    ├── cntlist
    ├── noun.act
    ├── noun.animal
    ├── noun.artifact
    ├── noun.attribute
    ├── noun.body
    ├── noun.cognition
    ├── noun.communication
    ├── noun.event
    ├── noun.feeling
    ├── noun.food
    ├── noun.group
    ├── noun.location
    ├── noun.motive
    ├── noun.object
    ├── noun.person
    ├── noun.phenomenon
    ├── noun.plant
    ├── noun.possession
    ├── noun.process
    ├── noun.quantity
    ├── noun.relation
    ├── noun.shape
    ├── noun.state
    ├── noun.substance
    ├── noun.time
    ├── noun.Tops
    ├── verb.body
    ├── verb.change
    ├── verb.cognition
    ├── verb.communication
    ├── verb.competition
    ├── verb.consumption
    ├── verb.contact
    ├── verb.creation
    ├── verb.emotion
    ├── verb.Framestext
    ├── verb.motion
    ├── verb.perception
    ├── verb.possession
    ├── verb.social
    ├── verb.stative
    ├── verb.weather

任何有關如何將 Wordnet 3.1 與 NLTK (Python) 一起使用的建議或解決方案都會有所幫助。

提前致謝。

Answer 1

經過大量搜索和反復試驗，我能夠在 NLTK (Python) 上使用 Wordnet 3.1。 我調整了這個要點以使其工作。 我在下面提供詳細信息。

我將要點中提供的代碼分為 3 個部分。

第 1 部分。download_extract.py

import os

nltkdata_wn = '/path/to/nltk_data/corpora/wordnet/'
wn31 = "http://wordnetcode.princeton.edu/wn3.1.dict.tar.gz"

if not os.path.exists(nltkdata_wn+'_3.0'):
    os.mkdir(nltkdata_wn+'_3.0')
os.system('mv '+nltkdata_wn+"* "+nltkdata_wn+"_3.0/")

if not os.path.exists('wn3.1.dict.tar.gz'):
    os.system('wget '+wn31)

os.system("tar zxf wn3.1.dict.tar.gz -C "+nltkdata_wn)
os.system("mv "+nltkdata_wn+"dict/* "+nltkdata_wn)
os.rmdir(nltkdata_wn + 'dict')

這用於將現有的 Wordnet 3.0 文件夾從wordnet備份到wordnet_3.0 ，下載 Wordnet 3.1 數據庫，並將其放入文件夾wordnet 。 由於我使用的是 Windows 系統，因此我手動完成了此操作。

第 2 部分：create_lexnames.py

import os

nltkdata_wn = '/path/to/nltk_data/corpora/wordnet/'
dbfiles = nltkdata_wn+'dbfiles'

with open(nltkdata_wn+'lexnames', 'w') as fout:
    for i,j in enumerate(sorted(os.listdir(dbfiles))):
        pos = j.partition('.')[0]
        if pos == "noun":
            syncat = 1
        elif pos == "verb":
            syncat = 2
        elif pos == "adj":
            syncat = 3
        elif pos == "adv":
            syncat = 4
        elif j == "cntlist":
            syncat = "cntlist"
        fout.write("\t".join([str(i).zfill(2),j,str(syncat)])+"\n")

這將在wordnet文件夾中創建所需的lexnames文件。

第 3 部分：testing_wn31.py

from nltk.corpus import wordnet as wn

nltkdata_wn = '/path/to/nltk_data/corpora/wordnet/'

# Checking generated lexnames file.
for i, line in enumerate(open(nltkdata_wn + 'lexnames','r')):
    index, lexname, _ = line.split()
    ##print line.split(), int(index), i
    assert int(index) == i

# Testing wordnet function.
print(wn.synsets('dog'))
for i in wn.all_synsets():
    print(i, i.pos(), i.definition())

這測試了生成的lexname文件，還測試了 wordnet 函數是否正常工作。

完成此過程后，我在 python 中運行以下代碼，發現它實際上運行的是 3.1 版

>>> from nltk.corpus import wordnet
>>> wordnet.get_version()
'3.1'

一個謹慎的詞

替換 Wordnet 3.1 數據庫后，您會注意到，如果您運行以下代碼

>>> import nltk
>>> nltk.download()

在下載對話框中，您將看到在Corpora選項卡下， Wordnet將顯示為out of date ，您不應嘗試更新它，因為它會將 wordnet 替換為 3.0 版或破壞它。

如何在 Python 上使用 Wordnet 3.1 和 NLTK？

問題描述

僅限 WordNet 3.1 數據庫文件

1 個解決方案

解決方案1
0 已采納 2020-12-22 08:42:16

如何在 Python 上使用 Wordnet 3.1 和 NLTK？

問題描述

僅限 WordNet 3.1 數據庫文件

1 個解決方案

解決方案1 0 已采納 2020-12-22 08:42:16

解決方案1
0 已采納 2020-12-22 08:42:16