![](/img/trans.png)
[英]Calculate the number of words and average word length used in the sentence in Python [on hold]
[英]How to calculate average word & Sentence length in python 2.7 from a text file
在過去的兩個星期中,我一直在堅持這一點,想知道您能否提供幫助。
我正在嘗試從文本文件計算平均單詞長度和句子長度。 我似乎無法繞過它。 我剛剛開始使用隨后在主文件中調用的函數。
我的主文件看起來像這樣
import Consonants
import Vowels
import Sentences
import Questions
import Words
""" Vowels """
text = Vowels.fileToString("test.txt")
x = Vowels.countVowels(text)
print str(x) + " Vowels"
""" Consonats """
text = Consonants.fileToString("test.txt")
x = Consonants.countConsonants(text)
print str(x) + " Consonants"
""" Sentences """
text = Sentences.fileToString("test.txt")
x = Sentences.countSentences(text)
print str(x) + " Sentences"
""" Questions """
text = Questions.fileToString("test.txt")
x = Questions.countQuestions(text)
print str(x) + " Questions"
""" Words """
text = Words.fileToString("test.txt")
x = Words.countWords(text)
print str(x) + " Words"
我的功能文件之一是這樣的:
def fileToString(filename):
myFile = open(filename, "r")
myText = ""
for ch in myFile:
myText = myText + ch
return myText
def countWords(text):
vcount = 0
spaces = [' ']
for letter in text:
if (letter in spaces):
vcount = vcount + 1
return vcount
我想知道如何將字長計算為導入的函數? 我在這里嘗試使用其他一些線程,但是它們對我而言無法正常工作。
我正在嘗試為您提供一種算法,
enumerate()
進行for
循環, split()
它,並檢查它們如何以endswith()
結尾。 喜歡; for ind,word in enumerate(readlines.split()): if word.endswith("?") ..... if word.endswith("!")
然后將它們放在字典中,並在while
循環中使用ind
(index)值;
obj = "Hey there! how are you? I hope you are ok."
dict1 = {}
for ind,word in enumerate(obj.split()):
dict1[ind]=word
x = 0
while x<len(dict1):
if "?" in dict1[x]:
print (list(dict1.values())[:x+1])
x += 1
輸出;
>>>
['Hey', 'there!', 'how', 'are', 'you?']
>>>
你看,我居然把這句話刪了直到達到?
。 因此,我現在在列表中有一個句子(您可以將其更改為!
)。 我可以達到每個元素的長度,其余的都是簡單的數學。 您將找到每個元素長度的總和,然后將其除以該列表的長度。 理論上,它將給出平均值。
請記住,這是算法。 您確實必須更改此代碼以適合您的數據,關鍵點是enumerate()
, endswith()
和dict
。
老實說,當您匹配單詞和句子之類的內容時,最好不要僅僅依靠str.split
來捕獲每一個str.split
情況, str.split
學習和使用正則表達式。
#text.txt
Here is some text. It is written on more than one line, and will have several sentences.
Some sentences will have their OWN line!
It will also have a question. Is this the question? I think it is.
#!/usr/bin/python
import re
with open('test.txt') as infile:
data = infile.read()
sentence_pat = re.compile(r"""
\b # sentences will start with a word boundary
([^.!?]+[.!?]+) # continue with one or more non-sentence-ending
# characters, followed by one or more sentence-
# ending characters.""", re.X)
word_pat = re.compile(r"""
(\S+) # Words are just groups of non-whitespace together
""", re.X)
sentences = sentence_pat.findall(data)
words = word_pat.findall(data)
average_sentence_length = sum([len(sentence) for sentence in sentences])/len(sentences)
average_word_length = sum([len(word) for word in words])/len(words)
DEMO:
>>> sentences
['Here is some text.',
'It is written on more than one line, and will have several sentences.',
'Some sentences will have their OWN line!',
'It will also have a question.',
'Is this the question?',
'I think it is.']
>>> words
['Here',
'is',
'some',
'text.',
'It',
'is',
... ,
'I',
'think',
'it',
'is.']
>>> average_sentence_length
31.833333333333332
>>> average_word_length
4.184210526315789
要回答這個問題:
我想知道如何將字長計算為導入的函數?
def avg_word_len(filename):
word_lengths = []
for line in open(filename).readlines():
word_lengths.extend([len(word) for word in line.split()])
return sum(word_lengths)/len(word_lengths)
注意:此處不考慮。 和! 字尾..等
如果您想自己制作腳本,則不適用,但是我將使用NLTK。 它有一些非常好的工具可以處理很長的文本。
本頁提供nltk的備忘單。 您應該能夠導入文本,以大量列表形式獲得情感,並獲得n-gram(長度為n的單詞)的列表。 然后,您可以計算平均值。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.