简体   繁体   English

正则表达式不区分大小写的搜索与确切的单词不匹配

[英]Regex case-insensitive searches not matching the exact word

I'm using the following regex to search for 3 different string formats, concurrently.我正在使用以下正则表达式同时搜索 3 种不同的字符串格式。 Additionally, I'm using re.IGNORECASE to match upper and lower case strings.此外,我使用re.IGNORECASE来匹配大小写字符串。 However, when I perform a search (eg 'locality'), I'm able to get string matches for 'localit', 'locali', 'local' and so on and so forth.但是,当我执行搜索(例如,'locality')时,我可以获得'localit'、'locali'、'local' 等的字符串匹配项。 I want to match the exact word (eg. 'locality').我想匹配确切的单词(例如'locality')。

Also, if there is white space between string characters (eg., 'l ocal i ty' ), I want to ignore it.另外,如果字符串字符之间有空格(例如'l ocal i ty' ),我想忽略它。 I have not found a re method that allows me to do that.我还没有找到允许我这样做的re方法。 I tried using re.ASCII , but I get an error: "...ascii is invalid."我尝试使用re.ASCII ,但出现错误:“...ascii 无效。” Any assistance is appreciated.任何帮助表示赞赏。

elif searchType =='2':
  print "  Directory to be searched: c:\Python27 "
  directory = os.path.join("c:\\","Python27")
  userstring = raw_input("Enter a string name to search: ")
  userStrHEX = userstring.encode('hex')
  userStrASCII = ' '.join(str(ord(char)) for char in userstring)
  regex = re.compile(r"(%s|%s|%s)" % ( re.escape( userstring ), re.escape( userStrHEX ), re.escape( userStrASCII ))re.IGNORECASE)
  for root,dirname, files in os.walk(directory):
     for file in files:
         if file.endswith(".log") or file.endswith(".txt"):
            f=open(os.path.join(root, file))
            for line in f.readlines():
               #if userstring in line:
               if regex.search(line):       
                  print "file: " + os.path.join(root,file)           
                  break
            else:
               #print "String NOT Found!"
               break
            f.close()

There is no such flag in re, so either: re中没有这样的标志,所以要么:

  • construct a regex with explicit whitespace-matching after every char:在每个字符后构造一个带有显式空格匹配的正则表达式:

    r'\s*'.join(c for c in userStrASCII)

    This works: myre.findall(line) finds 'l Oc ALi ty'这有效: myre.findall(line)发现 'l Oc ALi ty'

  • or (if you only need to detect matches to the pattern, but not do anything further with the actual match text) use string.translate(,deleteChars) to strip whitespace from the line before matching.或者(如果您只需要检测与模式的匹配,但不对实际匹配文本做任何进一步的操作)使用string.translate(,deleteChars)在匹配之前从行中去除空格。 eg do line.translate(None, ' \t\n\r').lower() before you try to match.例如,在尝试匹配之前执行line.translate(None, ' \t\n\r').lower() (Keep a copy of the unsquelched line.) (保留未压制线路的副本。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM