如何從文本文件計算python 2.7中的平均單詞和句子長度

Question

在過去的兩個星期中，我一直在堅持這一點，想知道您能否提供幫助。

我正在嘗試從文本文件計算平均單詞長度和句子長度。 我似乎無法繞過它。 我剛剛開始使用隨后在主文件中調用的函數。

我的主文件看起來像這樣

import Consonants
import Vowels
import Sentences
import Questions
import Words

""" Vowels """


text = Vowels.fileToString("test.txt")    
x = Vowels.countVowels(text)

print str(x) + " Vowels"

""" Consonats """

text = Consonants.fileToString("test.txt")    
x = Consonants.countConsonants(text)


print str(x) + " Consonants"

""" Sentences """


text = Sentences.fileToString("test.txt")    
x = Sentences.countSentences(text)
print str(x) + " Sentences"


""" Questions """

text = Questions.fileToString("test.txt")    
x = Questions.countQuestions(text)

print str(x) + " Questions"

""" Words """
text = Words.fileToString("test.txt")    
x = Words.countWords(text)

print str(x) + " Words"

我的功能文件之一是這樣的：

def fileToString(filename):
    myFile = open(filename, "r")
    myText = ""
    for ch in myFile:
        myText = myText + ch
    return myText

def countWords(text):
    vcount = 0
    spaces = [' ']
    for letter in text:
        if (letter in spaces):
            vcount = vcount + 1
    return vcount

我想知道如何將字長計算為導入的函數？ 我在這里嘗試使用其他一些線程，但是它們對我而言無法正常工作。

Answer 1

我正在嘗試為您提供一種算法，

讀取文件，使用enumerate()進行for循環， split()它，並檢查它們如何以endswith()結尾。 喜歡;

for ind,word in enumerate(readlines.split()): if word.endswith("?") ..... if word.endswith("!")

然后將它們放在字典中，並在while循環中使用ind （index）值；

obj = "Hey there! how are you? I hope you are ok."
dict1 = {}
for ind,word in enumerate(obj.split()):
    dict1[ind]=word

x = 0
while x<len(dict1):
    if "?" in dict1[x]:
        print (list(dict1.values())[:x+1])
    x += 1

輸出;

>>> 
['Hey', 'there!', 'how', 'are', 'you?']
>>>

你看，我居然把這句話刪了直到達到? 。 因此，我現在在列表中有一個句子（您可以將其更改為! ）。 我可以達到每個元素的長度，其余的都是簡單的數學。 您將找到每個元素長度的總和，然后將其除以該列表的長度。 理論上，它將給出平均值。

請記住，這是算法。 您確實必須更改此代碼以適合您的數據，關鍵點是enumerate() ， endswith()和dict 。

Answer 2

老實說，當您匹配單詞和句子之類的內容時，最好不要僅僅依靠str.split來捕獲每一個str.split情況， str.split學習和使用正則表達式。

#text.txt
Here is some text. It is written on more than one line, and will have several sentences.

Some sentences will have their OWN line!

It will also have a question. Is this the question? I think it is.

#!/usr/bin/python

import re

with open('test.txt') as infile:
    data = infile.read()

sentence_pat = re.compile(r"""
    \b                # sentences will start with a word boundary
    ([^.!?]+[.!?]+)   # continue with one or more non-sentence-ending
                      #    characters, followed by one or more sentence-
                      #    ending characters.""", re.X)

word_pat = re.compile(r"""
    (\S+)             # Words are just groups of non-whitespace together
    """, re.X)

sentences = sentence_pat.findall(data)
words = word_pat.findall(data)

average_sentence_length = sum([len(sentence) for sentence in sentences])/len(sentences)
average_word_length = sum([len(word) for word in words])/len(words)

DEMO：

>>> sentences
['Here is some text.',
 'It is written on more than one line, and will have several sentences.',
 'Some sentences will have their OWN line!',
 'It will also have a question.',
 'Is this the question?',
 'I think it is.']

>>> words
['Here',
 'is',
 'some',
 'text.',
 'It',
 'is',
 ... ,
 'I',
 'think',
 'it',
 'is.']

>>> average_sentence_length
31.833333333333332

>>> average_word_length
4.184210526315789

Answer 3

要回答這個問題：

我想知道如何將字長計算為導入的函數？

def avg_word_len(filename):
    word_lengths = []
    for line in open(filename).readlines():
        word_lengths.extend([len(word) for word in line.split()])
    return sum(word_lengths)/len(word_lengths)

注意：此處不考慮。 和！ 字尾..等

Answer 4

如果您想自己制作腳本，則不適用，但是我將使用NLTK。 它有一些非常好的工具可以處理很長的文本。

本頁提供nltk的備忘單。 您應該能夠導入文本，以大量列表形式獲得情感，並獲得n-gram（長度為n的單詞）的列表。 然后，您可以計算平均值。

如何從文本文件計算python 2.7中的平均單詞和句子長度

問題描述

4 個解決方案

解決方案1
1 2015-02-05 00:33:53

解決方案2
0 2015-02-05 00:43:00

解決方案3
0 2015-02-05 00:49:00

解決方案4
0 2015-02-05 00:49:57

如何從文本文件計算python 2.7中的平均單詞和句子長度

問題描述

4 個解決方案

解決方案1 1 2015-02-05 00:33:53

解決方案2 0 2015-02-05 00:43:00

解決方案3 0 2015-02-05 00:49:00

解決方案4 0 2015-02-05 00:49:57

解決方案1
1 2015-02-05 00:33:53

解決方案2
0 2015-02-05 00:43:00

解決方案3
0 2015-02-05 00:49:00

解決方案4
0 2015-02-05 00:49:57