简体   繁体   English

使用 Python, NLTK, 分析德文

[英]Using Python, NLTK, to analyse German text

I am a beginner in Python and currently trying to use NLTK to analyze German text (extract the German noun and it's frequency of German text) by following this tutorial: https://datascience.blog.wzb.eu/2016/07/13/accurate-part-of-speech-tagging-of-german-texts-with-nltk/我是 Python 的初学者,目前正在尝试按照本教程使用 NLTK 分析德语文本(提取德语名词及其频率): https://datascience.blog.wzb.eu/2016/07/13 /带有nltk的德语文本的准确词性标记/

There are several issues that I faced during the process and I am not able to solve them.在此过程中,我遇到了几个问题,我无法解决它们。

When I follow the website to execute the code below:当我按照网站执行以下代码时:

import random

tagged_sents = list(corp.tagged_sents())
random.shuffle(tagged_sents)
split_perc = 0.1
split_size = int(len(tagged_sents) * split_perc)
train_sents, test_sents = tagged_sents[split_size:], tagged_sents[:split_size]

and it comes out with this结果就是这样

Traceback (most recent call last):
  File "test2.py", line 7, in <module>
    tagged_sents = list(corp.tagged_sents())
  File "C:\Users\User\anaconda3\lib\site-packages\nltk\corpus\reader\conll.py", line 130, in tagged_sents
    return LazyMap(get_tagged_words, self._grids(fileids))
  File "C:\Users\User\anaconda3\lib\site-packages\nltk\corpus\reader\conll.py", line 215, in _grids
    return concat(
  File "C:\Users\User\anaconda3\lib\site-packages\nltk\corpus\reader\util.py", line 433, in concat
    raise ValueError("concat() expects at least one object!")
ValueError: concat() expects at least one object!

Then I try to fix by following this solution https://teamtreehouse.com/community/randomshuffle-crashes-when-passed-a-range-somenums-randomshufflerange5250然后我尝试按照这个解决方案https://teamtreehouse.com/community/randomshuffle-crashes-when-passed-a-range-somenums-randomshufflerange5250进行修复

and alter the tagged_sents = list(corp.tagged_sents())并更改tagged_sents = list(corp.tagged_sents())

to tagged_sents = list(range(5,250))tagged_sents = list(range(5,250))

And the ValueError didn't come out, I don't know what (5,250) means, although I have read the explanation.而且ValueError没有出来,我不知道(5,250)是什么意思,虽然我看过解释。

Then I continue to execute the follow step然后我继续执行下面的步骤

from ClassifierBasedGermanTagger.ClassifierBasedGermanTagger import ClassifierBasedGermanTagger
tagger = ClassifierBasedGermanTagger(train=train_sents) 

And it shows它显示

Traceback (most recent call last):
  File "test1.py", line 90, in <module>
    from ClassifierBasedGermanTagger.ClassifierBasedGermanTagger import ClassifierBasedGermanTagger
ModuleNotFoundError: No module named 'ClassifierBasedGermanTagger' 

I have already downloaded the ClassifierBasedGermanTagger.py and init .py and put them in the folder which link to the VS CODE, don't know if it is correct as the passage said:我已经下载了 ClassifierBasedGermanTagger.py 和init .py 并将它们放在链接到 VS CODE 的文件夹中,不知道是否正确,如文章所说:

'Using his Python class ClassifierBasedGermanTagger (which you can download from the github page) we can create a tagger and train it with the data from the TIGER corpus:' '使用他的 Python class ClassifierBasedGermanTagger(您可以从 github 页面下载)我们可以创建一个标记器并从 TIGER 语料库中训练它

Please help me to fix these problems, thanks!请帮我解决这些问题,谢谢!

First of all, welcome to StackOverflow, Before posting a question.首先,欢迎来到 StackOverflow,在发布问题之前。 please make sure that you have done your own research and most of the time it solves the problem.请确保您已经完成了自己的研究,并且大多数情况下它可以解决问题。

Secondly, range(start, end) is a very basic function in Python to get list of numbers based on the input and I don't think using it like the way you did is going to solve the problem.其次, range(start, end)是 Python 中的一个非常基本的 function ,用于根据输入获取数字列表,我认为不会像您那样使用它来解决问题。 I would suggest you to use print to see what kind of data is being populated in corp and start debugging from there.我建议您使用print来查看corp中填充了哪些类型的数据并从那里开始调试。 Maybe corp is just empty and that's why you don't get any tagged_sents .也许corp只是空的,这就是为什么你没有得到任何tagged_sents

For the the import part, it is not clear to me where did you put the ClassifierBasedGermanTagger.py but wherever it is, it is not visible to your code.对于导入部分,我不清楚您将ClassifierBasedGermanTagger.py放在哪里,但无论它在哪里,您的代码都看不到它。 You can try to put your code ( test2.py ) and ClassifierBasedGermanTagger.py in the same directory.您可以尝试将您的代码 ( test2.py ) 和ClassifierBasedGermanTagger.py放在同一目录中。 Read the link below for more details on how to properly import module in Python.阅读下面的链接以获取有关如何在 Python 中正确导入模块的更多详细信息。

https://docs.python.org/3/reference/import.html https://docs.python.org/3/reference/import.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM