简体   繁体   English

如何从用户输入中搜索文本文件中的单词列表?

[英]How can I search a text file for a list of words from user input?

I'm trying to make a simple word counter program in Python 3.4.1 where the user is to input a list of comma-separated words that are then analyzed for frequency in a sample text file. 我正在尝试在Python 3.4.1中创建一个简单的单词计数器程序,其中用户要输入逗号分隔的单词列表,然后在示例文本文件中分析其频率。

I'm currently stuck on how to search for the entered list of words in the text file. 我目前停留在如何搜索文本文件中输入的单词列表上。

First I tried: 首先,我尝试了:

file = input("What file would you like to open? ")
f = open(file, 'r')
search = input("Enter the words you want to search for (separate with commas): ").lower().split(",")
search = [x.strip(' ') for x in search]
count = {}
for word in search:
    count[word] = count.get(word,0)+1
for word in sorted(count):
    print(word, count[word])

This resulted in: 结果是:

What file would you like to open? twelve_days_of_fast_food.txt
Enter the words you want to search for (separate with commas): first, rings, the
first 1
rings 1
the 1

If that's anything to go by, I'm guessing this method only gave me the count of the words in the input list and not the count of the input list of words in the text file. 如果可以解决的话,我想这种方法只会给我输入列表中单词的数量,而不是文本文件中单词输入列表的数量。 So then I tried: 因此,我尝试了:

file = input("What file would you like to open? ")
f = open(file, 'r')
lines = f.readlines()
line = f.readline()
word = line.split()
search = input("Enter the words you want to search for (separate with commas): ").lower().split(",")
search = [x.strip(' ') for x in search]
count = {}
for word in lines:
    if word in search:
        count[word] = count.get(word,0)+1
for word in sorted(count):
    print(word, count[word])

This gave me nothing back. 这没有给我任何回报。 This is what happened: 这是发生了什么:

What file would you like to open? twelve_days_of_fast_food.txt
Enter the words you want to search for (separate with commas): first, the, rings
>>> 

What am I doing wrong? 我究竟做错了什么? How can I fix this problem? 我该如何解决这个问题?

You read all lines first (into lines , then tried to read just one line but the file already gave you all lines. In that case f.readline() gives you an empty line. From there on out your script is doomed to fail; you cannot count words in an empty line. 您首先读取所有行 (读入lines ,然后尝试仅读取一行,但文件已经给您所有行。在那种情况下, f.readline()给您空行。从那以后,您的脚本注定会失败;您不能在空行中数词。

You can loop over the file instead: 您可以改为遍历文件:

file = input("What file would you like to open? ")

search = input("Enter the words you want to search for (separate with commas): ")
search = [word.strip() for word in search.lower().split(",")]

# create a dictionary for all search words, setting each count to 0
count = dict.fromkeys(search, 0)

with open(file, 'r') as f:
    for line in f:
        for word in line.lower().split():
            if word in count:
                # found a word you wanted to count, so count it
                count[word] += 1

The with statement uses the opened file object as a context manager; with语句使用打开的文件对象作为上下文管理器; this just means it'll be closed again automatically when done. 这只是意味着完成后会自动将其再次关闭。

The for line in f: loop iterates over each separate line in the input file; for line in f:循环中的for line in f:遍历输入文件中的每一行; this is more efficient than using f.readlines() to read all lines into memory at once. 这比使用f.readlines()将所有行读入内存更有效。

I also cleaned up your search word stripping a little, and set the count dictionary to one with all the search words pre-defined to 0 ; 我还清理了一下剥离的搜索词,并将所有预定义为0的搜索词设置为count字典; this makes the actual counting a little easier. 这使得实际计数变得容易一些。

Because you now have a dictionary with all the search words, testing for matching words is best done against that dictionary. 因为您现在有了包含所有搜索词的字典,所以最好针对该字典进行匹配词的测试。 Testing against a dictionary is faster than testing against a list (the latter is a scan that takes longer the more words are in the list, while a dictionary test takes constant time on average, regardless of the number of items in the dictionary). 对字典进行测试比对列表进行测试要快(后者是一次扫描,列表中的单词越多,扫描所花费的时间就越长,而字典测试则平均花费恒定时间,而不管字典中的项目数如何)。

You could try this; 您可以尝试一下;

import re
import collections

wanted = ["cat", "dog"]
matches = re.findall('\w+',open('hamlet.txt').read().lower())
counts = collections.Counter(matches) # Count each occurance of words
map(lambda x:(x,counts[x]),wanted) # Will print the counts for wanted words

I referenced this solution when forming the answer. 形成答案时,我引用了此解决方案

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从用户输入中搜索单词列表的文本文件,并打印包含这些单词的行? - how can i search a text file of list of words from user input and print the line which contains these words? 如何在文本文件中搜索特定列以获取用户输入 - How can I search a specific column in text file for a user input 列出文本文件中的单词并按输入字母搜索这些单词 - Make a list of words from text file and search these words by letters from input Python:如何在文本文件中搜索用户输入的包含整数的字符串? - Python: How can I search a text file for a string input by the user that contains an integer? 如何将用户输入放入文本小部件中并使用 Python 使用 tkinter 在日志文件中进行搜索 - How can I put user input inside text widget and do search in Log file with tkinter using Python 如何从匹配文本列表中获得子字符串或2个单词? - How can I get substring or 2 words from the list on matching text? 如何从用户输入中对文本文件执行搜索 - How do I perform a search on a text file from a users input 如何将所有用户输入保存在文本文件中? - How can I save ALL user input in a text file? 如何获取用户输入并将其写入文本文件? - How can I take user input and write it to a Text file? 如何检查用户输入是否在列表中的随机字符串中? - How can I check if user input is in a random string from a list?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM