Python：查找所有以文本结尾的单词（re.findall）

Question

Load macOS.txt into a variable text.将 macOS.txt 加载到变量文本中。 Then do the following: Find all the occurrences of macOS, Mac OS, and OS X in the text.然后执行以下操作：查找文本中出现的所有macOS、Mac OS 和 OS X。 Put the results in one list.将结果放在一个列表中。 Print the list of those words then print the following: There are {length of list} words mentioning macOS, Mac OS, or OS X in the text.打印这些单词的列表，然后打印以下内容：有 {length of list} 个单词在文本中提到 macOS、Mac OS 或 OS X。

I think I should use REGULAR EXPRESSION.Like re.findall or re.finditer.我想我应该使用 REGULAR EXPRESSION.Like re.findall 或 re.finditer。 Anyone can correct my codes below?任何人都可以在下面更正我的代码吗？

text = open("macOS.txt", "r")  
import re
pattern = '[A-Za-z0-9-]+' 
lines = "OS"  
ls = re.findall(pattern,lines)
print(ls)

But how to Find all the occurrences of macOS, Mac OS, and OS X in the text?但是如何在文本中找到所有出现的 macOS、Mac OS 和 OS X？

or this?或这个？

import re
with open('macOS.txt', 'r') as f:
  content = f.read()
temp = re.findall(\b(?!\w*OS\b)\w*OS\b)
print(f'There are {len(temp)} words ended with OS (other than OS and macOS) in the text.')

Answer 1

You can use fuzzywuzzy library.您可以使用fuzzywuzzy 库。 Take few letters before and after finding 'OS", use the fuzzywuzzy library to compare. https://www.geeksforgeeks.org/fuzzywuzzy-python-library/在找到“OS”之前和之后取几个字母，使用fuzzywuzzy库进行比较。https://www.geeksforgeeks.org/fuzzywuzzy-python-library/

Alternatively, if your output is limited to one word before and after 'OS', then you can just do this-或者，如果您的 output 在“OS”前后限制为一个字，那么您可以这样做 -

if that word contains OS (macOS)如果该词包含 OS (macOS)
find one word prior to OS => see if its 'Mac' => concat them在 OS 之前找到一个词 => 看看它是否是 'Mac' => 连接它们
find one word after OS => see if its 'X' => concat them在 OS 之后找到一个词 => 看看它是否是 'X' => 连接它们

Answer 2

Use利用

re.findall(r'\b(?:(?:Mac |mac)OS|OS X)\b', s)

See proof .见证明。

EXPLANATION解释

--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    (?:                      group, but do not capture:
--------------------------------------------------------------------------------
      Mac                      'Mac '
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      mac                      'mac'
--------------------------------------------------------------------------------
    )                        end of grouping
--------------------------------------------------------------------------------
    OS                       'OS'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    OS X                     'OS X'
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char

Python：查找所有以文本结尾的单词（re.findall）

问题描述

2 个解决方案

解决方案1
1 2021-05-03 07:59:07

解决方案2
1 2021-05-08 23:16:52

Python：查找所有以文本结尾的单词（re.findall）

问题描述

2 个解决方案

解决方案1 1 2021-05-03 07:59:07

解决方案2 1 2021-05-08 23:16:52

解决方案1
1 2021-05-03 07:59:07

解决方案2
1 2021-05-08 23:16:52