如何从用户输入中搜索文本文件中的单词列表？

Question

I'm trying to make a simple word counter program in Python 3.4.1 where the user is to input a list of comma-separated words that are then analyzed for frequency in a sample text file. 我正在尝试在Python 3.4.1中创建一个简单的单词计数器程序，其中用户要输入逗号分隔的单词列表，然后在示例文本文件中分析其频率。

I'm currently stuck on how to search for the entered list of words in the text file. 我目前停留在如何搜索文本文件中输入的单词列表上。

First I tried: 首先，我尝试了：

file = input("What file would you like to open? ")
f = open(file, 'r')
search = input("Enter the words you want to search for (separate with commas): ").lower().split(",")
search = [x.strip(' ') for x in search]
count = {}
for word in search:
    count[word] = count.get(word,0)+1
for word in sorted(count):
    print(word, count[word])

This resulted in: 结果是：

What file would you like to open? twelve_days_of_fast_food.txt
Enter the words you want to search for (separate with commas): first, rings, the
first 1
rings 1
the 1

If that's anything to go by, I'm guessing this method only gave me the count of the words in the input list and not the count of the input list of words in the text file. 如果可以解决的话，我想这种方法只会给我输入列表中单词的数量，而不是文本文件中单词输入列表的数量。 So then I tried: 因此，我尝试了：

file = input("What file would you like to open? ")
f = open(file, 'r')
lines = f.readlines()
line = f.readline()
word = line.split()
search = input("Enter the words you want to search for (separate with commas): ").lower().split(",")
search = [x.strip(' ') for x in search]
count = {}
for word in lines:
    if word in search:
        count[word] = count.get(word,0)+1
for word in sorted(count):
    print(word, count[word])

This gave me nothing back. 这没有给我任何回报。 This is what happened: 这是发生了什么：

What file would you like to open? twelve_days_of_fast_food.txt
Enter the words you want to search for (separate with commas): first, the, rings
>>>

What am I doing wrong? 我究竟做错了什么？ How can I fix this problem? 我该如何解决这个问题？

Answer 1

You read all lines first (into lines , then tried to read just one line but the file already gave you all lines. In that case f.readline() gives you an empty line. From there on out your script is doomed to fail; you cannot count words in an empty line. 您首先读取所有行 （读入lines ，然后尝试仅读取一行，但文件已经给您所有行。在那种情况下， f.readline()给您空行。从那以后，您的脚本注定会失败；您不能在空行中数词。

You can loop over the file instead: 您可以改为遍历文件：

file = input("What file would you like to open? ")

search = input("Enter the words you want to search for (separate with commas): ")
search = [word.strip() for word in search.lower().split(",")]

# create a dictionary for all search words, setting each count to 0
count = dict.fromkeys(search, 0)

with open(file, 'r') as f:
    for line in f:
        for word in line.lower().split():
            if word in count:
                # found a word you wanted to count, so count it
                count[word] += 1

The with statement uses the opened file object as a context manager; with语句使用打开的文件对象作为上下文管理器； this just means it'll be closed again automatically when done. 这只是意味着完成后会自动将其再次关闭。

The for line in f: loop iterates over each separate line in the input file; for line in f:循环中的for line in f:遍历输入文件中的每一行； this is more efficient than using f.readlines() to read all lines into memory at once. 这比使用f.readlines()将所有行读入内存更有效。

I also cleaned up your search word stripping a little, and set the count dictionary to one with all the search words pre-defined to 0 ; 我还清理了一下剥离的搜索词，并将所有预定义为0的搜索词设置为count字典； this makes the actual counting a little easier. 这使得实际计数变得容易一些。

Because you now have a dictionary with all the search words, testing for matching words is best done against that dictionary. 因为您现在有了包含所有搜索词的字典，所以最好针对该字典进行匹配词的测试。 Testing against a dictionary is faster than testing against a list (the latter is a scan that takes longer the more words are in the list, while a dictionary test takes constant time on average, regardless of the number of items in the dictionary). 对字典进行测试比对列表进行测试要快（后者是一次扫描，列表中的单词越多，扫描所花费的时间就越长，而字典测试则平均花费恒定时间，而不管字典中的项目数如何）。

Answer 2

You could try this; 您可以尝试一下；

import re
import collections

wanted = ["cat", "dog"]
matches = re.findall('\w+',open('hamlet.txt').read().lower())
counts = collections.Counter(matches) # Count each occurance of words
map(lambda x:(x,counts[x]),wanted) # Will print the counts for wanted words

I referenced this solution when forming the answer. 形成答案时，我引用了此解决方案。

如何从用户输入中搜索文本文件中的单词列表？

问题描述

2 个解决方案

解决方案1
1 已采纳 2014-11-23 03:23:58

解决方案2
1 2014-11-23 03:42:59

如何从用户输入中搜索文本文件中的单词列表？

问题描述

2 个解决方案

解决方案1 1 已采纳 2014-11-23 03:23:58

解决方案2 1 2014-11-23 03:42:59

解决方案1
1 已采纳 2014-11-23 03:23:58

解决方案2
1 2014-11-23 03:42:59