![](/img/trans.png)
[英]Calculate the number of words and average word length used in the sentence in Python [on hold]
[英]How to calculate average word & Sentence length in python 2.7 from a text file
在过去的两个星期中,我一直在坚持这一点,想知道您能否提供帮助。
我正在尝试从文本文件计算平均单词长度和句子长度。 我似乎无法绕过它。 我刚刚开始使用随后在主文件中调用的函数。
我的主文件看起来像这样
import Consonants
import Vowels
import Sentences
import Questions
import Words
""" Vowels """
text = Vowels.fileToString("test.txt")
x = Vowels.countVowels(text)
print str(x) + " Vowels"
""" Consonats """
text = Consonants.fileToString("test.txt")
x = Consonants.countConsonants(text)
print str(x) + " Consonants"
""" Sentences """
text = Sentences.fileToString("test.txt")
x = Sentences.countSentences(text)
print str(x) + " Sentences"
""" Questions """
text = Questions.fileToString("test.txt")
x = Questions.countQuestions(text)
print str(x) + " Questions"
""" Words """
text = Words.fileToString("test.txt")
x = Words.countWords(text)
print str(x) + " Words"
我的功能文件之一是这样的:
def fileToString(filename):
myFile = open(filename, "r")
myText = ""
for ch in myFile:
myText = myText + ch
return myText
def countWords(text):
vcount = 0
spaces = [' ']
for letter in text:
if (letter in spaces):
vcount = vcount + 1
return vcount
我想知道如何将字长计算为导入的函数? 我在这里尝试使用其他一些线程,但是它们对我而言无法正常工作。
我正在尝试为您提供一种算法,
enumerate()
进行for
循环, split()
它,并检查它们如何以endswith()
结尾。 喜欢; for ind,word in enumerate(readlines.split()): if word.endswith("?") ..... if word.endswith("!")
然后将它们放在字典中,并在while
循环中使用ind
(index)值;
obj = "Hey there! how are you? I hope you are ok."
dict1 = {}
for ind,word in enumerate(obj.split()):
dict1[ind]=word
x = 0
while x<len(dict1):
if "?" in dict1[x]:
print (list(dict1.values())[:x+1])
x += 1
输出;
>>>
['Hey', 'there!', 'how', 'are', 'you?']
>>>
你看,我居然把这句话删了直到达到?
。 因此,我现在在列表中有一个句子(您可以将其更改为!
)。 我可以达到每个元素的长度,其余的都是简单的数学。 您将找到每个元素长度的总和,然后将其除以该列表的长度。 理论上,它将给出平均值。
请记住,这是算法。 您确实必须更改此代码以适合您的数据,关键点是enumerate()
, endswith()
和dict
。
老实说,当您匹配单词和句子之类的内容时,最好不要仅仅依靠str.split
来捕获每一个str.split
情况, str.split
学习和使用正则表达式。
#text.txt
Here is some text. It is written on more than one line, and will have several sentences.
Some sentences will have their OWN line!
It will also have a question. Is this the question? I think it is.
#!/usr/bin/python
import re
with open('test.txt') as infile:
data = infile.read()
sentence_pat = re.compile(r"""
\b # sentences will start with a word boundary
([^.!?]+[.!?]+) # continue with one or more non-sentence-ending
# characters, followed by one or more sentence-
# ending characters.""", re.X)
word_pat = re.compile(r"""
(\S+) # Words are just groups of non-whitespace together
""", re.X)
sentences = sentence_pat.findall(data)
words = word_pat.findall(data)
average_sentence_length = sum([len(sentence) for sentence in sentences])/len(sentences)
average_word_length = sum([len(word) for word in words])/len(words)
DEMO:
>>> sentences
['Here is some text.',
'It is written on more than one line, and will have several sentences.',
'Some sentences will have their OWN line!',
'It will also have a question.',
'Is this the question?',
'I think it is.']
>>> words
['Here',
'is',
'some',
'text.',
'It',
'is',
... ,
'I',
'think',
'it',
'is.']
>>> average_sentence_length
31.833333333333332
>>> average_word_length
4.184210526315789
要回答这个问题:
我想知道如何将字长计算为导入的函数?
def avg_word_len(filename):
word_lengths = []
for line in open(filename).readlines():
word_lengths.extend([len(word) for word in line.split()])
return sum(word_lengths)/len(word_lengths)
注意:此处不考虑。 和! 字尾..等
如果您想自己制作脚本,则不适用,但是我将使用NLTK。 它有一些非常好的工具可以处理很长的文本。
本页提供nltk的备忘单。 您应该能够导入文本,以大量列表形式获得情感,并获得n-gram(长度为n的单词)的列表。 然后,您可以计算平均值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.