![](/img/trans.png)
[英]how to loop around every word in a line and then every line in a file?
[英]Count the number of characters in every word of every line of a file
此代碼將打印文本文件中的總行數,單詞總數和字符總數。 它工作正常,並提供了預期的輸出。 但是我想計算每行中的字符數並像這樣打印:-
Line No. 1 has 58 Characters
Line No. 2 has 24 Characters
代碼:-
import string
def fileCount(fname):
#counting variables
lineCount = 0
wordCount = 0
charCount = 0
words = []
#file is opened and assigned a variable
infile = open(fname, 'r')
#loop that finds the number of lines in the file
for line in infile:
lineCount = lineCount + 1
word = line.split()
words = words + word
#loop that finds the number of words in the file
for word in words:
wordCount = wordCount + 1
#loop that finds the number of characters in the file
for char in word:
charCount = charCount + 1
#returns the variables so they can be called to the main function
return(lineCount, wordCount, charCount)
def main():
fname = input('Enter the name of the file to be used: ')
lineCount, wordCount, charCount = fileCount(fname)
print ("There are", lineCount, "lines in the file.")
print ("There are", charCount, "characters in the file.")
print ("There are", wordCount, "words in the file.")
main()
如
for line in infile:
lineCount = lineCount + 1
正在計算整條線,但是如何進行此操作的每一條線呢? 我正在使用Python 3.X
將所有信息存儲在字典中,然后按鍵訪問。
def fileCount(fname):
#counting variables
d = {"lines":0, "words": 0, "lengths":[]}
#file is opened and assigned a variable
with open(fname, 'r') as f:
for line in f:
# split into words
spl = line.split()
# increase count for each line
d["lines"] += 1
# add length of split list which will give total words
d["words"] += len(spl)
# get the length of each word and sum
d["lengths"].append(sum(len(word) for word in spl))
return d
def main():
fname = input('Enter the name of the file to be used: ')
data = fileCount(fname)
print ("There are {lines} lines in the file.".format(**data))
print ("There are {} characters in the file.".format(sum(data["lengths"])))
print ("There are {words} words in the file.".format(**data))
# enumerate over the lengths, outputting char count for each line
for ind, s in enumerate(data["lengths"], 1):
print("Line: {} has {} characters.".format(ind, s))
main()
該代碼僅適用於由空格分隔的單詞,因此您需要牢記這一點。
定義一set
您希望計數的允許字符,然后可以使用len
獲取大部分數據。
在下面,我選擇了字符集:
['!','“”,'#','$','%','&','''','(',')','*','+',',',' -','。','/','0','1','2','3','4','5','6','7','8','9' ,':',';','<','=','>','?','@','A','B','C','D','E',' F','G','H','I','J','K','L','M','N','O','P','Q','R' ,“ S”,“ T”,“ U”,“ V”,“ W”,“ X”,“ Y”,“ Z”,“ [”,“ \\”,“]”,“ ^”,“ _','`','a','b','c','d','e','f','g','h','i','j','k' ,“ l”,“ m”,“ n”,“ o”,“ p”,“ q”,“ r”,“ s”,“ t”,“ u”,“ v”,“ w”,“ x','y','z','{','|','}','〜']
#Define desired character set
valid_chars = set([chr(i) for i in range(33,127)])
total_lines = total_words = total_chars = 0
line_details = []
with open ('test.txt', 'r') as f:
for line in f:
total_lines += 1
line_char_count = len([char for char in line if char in valid_chars])
total_chars += line_char_count
total_words += len(line.split())
line_details.append("Line %d has %d characters" % (total_lines, line_char_count))
print ("There are", total_lines, "lines in the file.")
print ("There are", total_chars, "characters in the file.")
print ("There are", total_words, "words in the file.")
for line in line_details:
print (line)
我被分配了創建一個程序來打印一行中的字符數的任務。
作為編程的菜鳥,我發現這很困難:(。
這是我想出的,以及他的回應-
這是程序的核心部分:
with open ('data_vis_tips.txt', 'r') as inFile:
with open ('count_chars_per_line.txt', 'w') as outFile:
chars = 0
for line in inFile:
line = line.strip('\n')
chars = len(line)
outFile.write(str(len(line))+'\n')
可以簡化為:
with open ('data_vis_tips.txt', 'r') as inFile:
for line in inFile:
line = line.strip()
num_chars = len(line)
print(num_chars)
注意,不需要strip()函數的參數。 默認情況下,它會刪除空格,而'\\ n'是空格。
這是一個使用內置collections.Counter
的更簡單的版本。Counter是對輸入進行計數的專用字典。 我們可以使用Counter.update()
方法在每行中輸入所有單詞(無論是否唯一):
from collections import Counter
def file_count_2(fname):
line_count = 0
word_counter = Counter()
infile = open(fname, 'r')
for line in infile:
line_count += 1
word_counter.update( line.split() )
word_count = 0
char_count = 0
for word, cnt in word_counter.items():
word_count += cnt
char_count += cnt * len(word)
print(word_counter)
return line_count, word_count, char_count
筆記:
words
(更好的方法是僅對唯一單詞進行哈希處理並存儲其計數,這是Counter所做的事情),並且每次我們看到時都不需要迭代和遞增charCount一個單詞的出現。 word_count
而不是char_count
,則可以直接采用word_count = sum(word_counter.values())
而不需要遍歷word_counter
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.