将可读性公式转换为python函数

Question

我得到了一个称为FRES（Flesch易读性测试）的公式，该公式用于测量文档的可读性：

我的任务是编写一个返回文本FRES的python函数。 因此，我需要将此公式转换为python函数。

我已经从一个答案中重新实现了我的代码，我必须显示到目前为止的内容以及它给我的结果：

import nltk
import collections
nltk.download('punkt')
nltk.download('gutenberg')
nltk.download('brown')
nltk.download('averaged_perceptron_tagger')
nltk.download('universal_tagset')

import re
from itertools import chain
from nltk.corpus import gutenberg
VC = re.compile('[aeiou]+[^aeiou]+', re.I)
def count_syllables(word):
    return len(VC.findall(word))

def compute_fres(text):
    """Return the FRES of a text.
    >>> emma = nltk.corpus.gutenberg.raw('austen-emma.txt')
    >>> compute_fres(emma) # doctest: +ELLIPSIS
    99.40...
    """

for filename in gutenberg.fileids():
    sents = gutenberg.sents(filename)
    words = gutenberg.words(filename)
    num_sents = len(sents)
    num_words = len(words)
    num_syllables = sum(count_syllables(w) for w in words)
    score = 206.835 - 1.015 * (num_words / num_sents) - 84.6 * (num_syllables / num_words)
return(score)

运行代码后，这是我得到的结果消息：

Failure

Expected :99.40...

Actual   :92.84866041488623

File "C:/Users/PycharmProjects/a1/a1.py", line 60, in a1.compute_fres
Failed example:
    compute_fres(emma) # doctest: +ELLIPSIS

Expected:
    99.40...
Got:
    92.84866041488623

我的函数应该通过doctest并得到99.40的结果...而且我也不能编辑音节函数，因为它是随任务一起提供的：

import re
VC = re.compile('[aeiou]+[^aeiou]+', re.I)
def count_syllables(word):
    return len(VC.findall(word))

这个问题非常棘手，但至少现在是给我一个结果而不是一条错误消息，虽然不确定为什么给我一个不同的结果。

任何帮助将不胜感激。 谢谢。

Answer 1

顺便说一句，这里有textstat库。

from textstat.textstat import textstat
from nltk.corpus import gutenberg

for filename in gutenberg.fileids():
    print(filename, textstat.flesch_reading_ease(filename))

如果您打算自己编写代码，那么首先

决定标点符号是否为单词
定义如何计算数字。 这个单词的音节

如果标点是一个单词，而您的问题中的正则表达式会计算出音节，则：

import re
from itertools import chain
from nltk.corpus import gutenberg

def num_syllables_per_word(word):
    return len(re.findall('[aeiou]+[^aeiou]+', word))

for filename in gutenberg.fileids():
    sents = gutenberg.sents(filename)
    words = gutenberg.words(filename) # i.e. list(chain(*sents))
    num_sents = len(sents)
    num_words = len(words)
    num_syllables = sum(num_syllables_per_word(w) for w in words)
    score = 206.835 - 1.015 * (num_words / num_sents) - 84.6 * (num_syllables / num_words)
    print(filename, score)

将可读性公式转换为python函数

问题描述

1 个解决方案

解决方案1
0 2018-03-13 14:58:31

将可读性公式转换为python函数

问题描述

1 个解决方案

解决方案1 0 2018-03-13 14:58:31

解决方案1
0 2018-03-13 14:58:31