特定字符串是否与文本文件中的字符串匹配

Question

I have a text file containing many words (single word on each line). 我有一个包含许多单词的文本文件（每行一个单词）。 I have to read in each word, modify the words, and then check if the modified word matches any of the words in the file. 我必须阅读每个单词，修改单词，然后检查修改后的单词是否与文件中的任何单词匹配。 I am having trouble with the last part (it is the hasMatch method in my code). 我在最后一部分上遇到了麻烦（这是我代码中的hasMatch方法）。 It sounds simple enough and I know what I should do, but whatever I try does not work. 听起来很简单，我知道该怎么做，但是无论我尝试什么都行不通。

#read in textfile 
myFile = open('good_words.txt')


#function to remove first and last character in string, and reverse string
def modifyString(str):
    rmFirstLast = str[1:len(str)-2] #slicing first and last char
    reverseStr = rmFirstLast[::-1] #reverse string 
    return reverseStr

#go through list of words to determine if any string match modified string
def hasMatch(modifiedStr):
    for line in myFile:
        if line == modifiedStr:
            print(modifiedStr + " found")
        else:
            print(modifiedStr + "not found")

for line in myFile:
    word = str(line) #save string in line to a variable

    #only modify strings that are greater than length 3
    if len(word) >= 4:
        #global modifiedStr #make variable global
        modifiedStr = modifyString(word) #do string modification
        hasMatch(modifiedStr)

myFile.close()

Answer 1

Several problems here 这里有几个问题

you have to strip the lines or you get linefeed/CR chars that fail the match 您必须剥离线条，否则会得到匹配失败的换行/ CR字符
you have to read the file once and for all or the file iterator runs out after the first time 您必须一劳永逸地读取文件，否则文件迭代器将在第一次后用完
the speed is bad: sped up for the search using a set instead of a list 速度很差：使用set而不是list来加快搜索速度
the slicing is overly complicated and wrong: str[1:-1] does it (thanks to those who commented my answer) 切片过于复杂和错误： str[1:-1]进行切片（感谢评论了我的答案的人）
The whole code is really to long & complex. 整个代码确实冗长而复杂。 I summed it up in a few lines. 我总结了几行。

code: 码：

#read in textfile
myFile = open('good_words.txt')
# make a set (faster search), remove linefeeds
lines = set(x.strip() for x in myFile)
myFile.close()

# iterate on the lines
for word in lines:
    #only consider strings that are greater than length 3
    if len(word) >= 4:
        modifiedStr = word[1:-1][::-1] #do string modification
        if modifiedStr in lines:
            print(modifiedStr + " found (was "+word+")")
        else:
            print(modifiedStr + " not found")

I tested the program on a list of common english words and I got those matches: 我在常用英语单词列表上测试了该程序，并找到了匹配项：

so found (was most)
or found (was from)
no found (was long)
on found (was know)
to found (was both)

Edit: another version which drops the set and uses bisect on the sorted list to avoid hashing/hash collisions. 编辑：删除版本set并在已排序列表上使用bisect以避免哈希/哈希冲突的另一个版本。

import os,bisect

#read in textfile
myFile = open("good_words.txt"))
lines = sorted(x.strip() for x in myFile) # make a sorted list, remove linefeeds
myFile.close()

result=[]
for word in lines:

    #only modify strings that are greater than length 3
    if len(word) >= 4:
        modifiedStr = word[1:-1][::-1] #do string modification
        # search where to insert the modified word
        i=bisect.bisect_left(lines,modifiedStr)
        # if can be inserted and word is actually at this position: found
        if i<len(lines) and lines[i]==modifiedStr:
            print(modifiedStr + " found (was "+word+")")
        else:
            print(modifiedStr + " not found")

Answer 2

In your code, you're not slicing just the first and last character but the first and last two characters. 在您的代码中，您不仅要切片第一个和最后一个字符，而且要切片第一个和最后两个字符。

rmFirstLast = str[1:len(str)-2]

Change that to: 更改为：

rmFirstLast = str[1:len(str)-1]

特定字符串是否与文本文件中的字符串匹配

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-09-03 18:19:46

解决方案2
0 2016-09-03 18:23:57

特定字符串是否与文本文件中的字符串匹配

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-09-03 18:19:46

解决方案2 0 2016-09-03 18:23:57

解决方案1
2 已采纳 2016-09-03 18:19:46

解决方案2
0 2016-09-03 18:23:57