简体   繁体   English

从文件中获取字数,忽略python中的注释行

[英]Get word count from a file ignoring comment lines in python

I am trying to count number of occurrences of a word from a file, using Python. 我试图使用Python计算文件中单词的出现次数。 But I have to ignore comments in the file. 但是我必须忽略文件中的注释。

I have a function like this: 我有这样的功能:

def getWordCount(file_name, word):
  count = file_name.read().count(word)
  file_name.seek(0)
  return count

How to ignore where the line begins with a # ? 如何忽略该行以#开头的位置?

I know this can be done by reading the file line by line like stated in this question . 我知道这可以通过逐行读取文件来完成,就像这个问题中所述 Are there any faster, more Pythonian way to do so ? 有更快,更蟒蛇的方式吗?

You can do one thing just create a file that is not having the commented line then run your code Ex. 你可以做一件事只是创建一个没有注释行的文件然后运行你的代码Ex。

infile = file('./file_with_comment.txt')

newopen = open('./newfile.txt', 'w')
for line in infile :
    li=line.strip()
    if not li.startswith("#"):
        newopen.write(line)

newopen.close()

This will remove every line startswith # then run your function on newfile.txt 这将删除每行newfile.txt然后在newfile.txt上运行你的函数

def getWordCount(file_name, word):
  count = file_name.read().count(word)
  file_name.seek(0)
  return count

More Pythonian would be this: 更多Python将是这样的:

def getWordCount(file_name, word):
  with open(file_name) as wordFile:
    return sum(line.count(word)
      for line in wordFile
      if not line.startswith('#'))

Faster (which is independent from being Pythonian) could be to read the whole file into one string, then use regexps to find the words not in a line starting with a hash. 更快(它独立于Pythonian)可以将整个文件读入一个字符串,然后使用regexps查找不在以哈希开头的行中的单词。

You can use a regular expression to filter out comments: 您可以使用正则表达式过滤掉注释:

import re

text = """ This line contains a word. # empty
This line contains two: word word  # word
newline
# another word
"""

filtered = ''.join(re.split('#.*', text))
print(filtered)
#  This line contains a word. 
# This line contains two: word word  
# newline

print(text.count('word'))  # 5
print(filtered.count('word'))  # 3

Just replace text with your file_name.read() . 只需用file_name.read()替换text

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM