[英]Get word count from a file ignoring comment lines in python
I am trying to count number of occurrences of a word from a file, using Python. 我试图使用Python计算文件中单词的出现次数。 But I have to ignore comments in the file.
但是我必须忽略文件中的注释。
I have a function like this: 我有这样的功能:
def getWordCount(file_name, word):
count = file_name.read().count(word)
file_name.seek(0)
return count
How to ignore where the line begins with a #
? 如何忽略该行以
#
开头的位置?
I know this can be done by reading the file line by line like stated in this question . 我知道这可以通过逐行读取文件来完成,就像这个问题中所述 。 Are there any faster, more Pythonian way to do so ?
有更快,更蟒蛇的方式吗?
You can do one thing just create a file that is not having the commented line then run your code Ex. 你可以做一件事只是创建一个没有注释行的文件然后运行你的代码Ex。
infile = file('./file_with_comment.txt')
newopen = open('./newfile.txt', 'w')
for line in infile :
li=line.strip()
if not li.startswith("#"):
newopen.write(line)
newopen.close()
This will remove every line startswith # then run your function on newfile.txt
这将删除每行
newfile.txt
然后在newfile.txt
上运行你的函数
def getWordCount(file_name, word):
count = file_name.read().count(word)
file_name.seek(0)
return count
More Pythonian would be this: 更多Python将是这样的:
def getWordCount(file_name, word):
with open(file_name) as wordFile:
return sum(line.count(word)
for line in wordFile
if not line.startswith('#'))
Faster (which is independent from being Pythonian) could be to read the whole file into one string, then use regexps to find the words not in a line starting with a hash. 更快(它独立于Pythonian)可以将整个文件读入一个字符串,然后使用regexps查找不在以哈希开头的行中的单词。
You can use a regular expression to filter out comments: 您可以使用正则表达式过滤掉注释:
import re
text = """ This line contains a word. # empty
This line contains two: word word # word
newline
# another word
"""
filtered = ''.join(re.split('#.*', text))
print(filtered)
# This line contains a word.
# This line contains two: word word
# newline
print(text.count('word')) # 5
print(filtered.count('word')) # 3
Just replace text
with your file_name.read()
. 只需用
file_name.read()
替换text
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.