简体   繁体   English

如何从python中的文本中提取仅包含字母的单词?

[英]How to extract words containing only letters from a text in python?

For example in the following text:例如在以下文本中:

"We’d love t0 help 123you, but the real1ty is th@t n0t every question gets answered. To improve your chances, here are some tips:"

How to easily extract words containing only letters:如何轻松提取仅包含字母的单词:

love, help, but,... To,... tips

I tried我试过

words = re.findall(r'^[a-zA-Z]+',str)
    for word in words:
print word

where str is the text.其中str是文本。 This does some work but I need to tweak it somehow.这做了一些工作,但我需要以某种方式调整它。

Any ideas how to do it with regular expressions?任何想法如何使用正则表达式做到这一点?

You may use list comprehension.您可以使用列表理解。

s = "We’d love t0 help 123you, but the real1ty is th@t n0t every question gets answered. To improve your chances, here are some tips:"
print [i for i in s.split() if i.isalpha()]
  • s.split() will split the input according to the spaces. s.split()将根据空格分割输入。
  • Just iterate over the returned items and consider the ones which exactly contain alphabets.只需迭代返回的项目并考虑那些完全包含字母的项目。

Use

re.findall(r'(?<!\S)[A-Za-z]+(?!\S)', x)
re.findall(r'\b[A-Za-z]+\b', x)

Or with Unicode support:或者使用 Unicode 支持:

re.findall(r'(?<!\S)[^\W\d_]+(?!\S)', x)
re.findall(r'\b[^\W\d_]+\b', x)

See regex proof .请参阅正则表达式证明

Use (?<!\\S) and (?!\\S) to find words inside whitespace.使用(?<!\\S)(?!\\S)查找空格内的单词。 Use \\b if you need words between punctuation and whitespace.如果您需要标点符号和空格之间的单词,请使用\\b

EXPLANATION解释

--------------------------------------------------------------------------------
  (?<!                     look behind to see if there is not:
--------------------------------------------------------------------------------
    \S                       non-whitespace (all but \n, \r, \t, \f,
                             and " ")
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  [A-Za-z]+                any character of: 'A' to 'Z', 'a' to 'z'
                           (1 or more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
[^\W\d_]+                any character except: non-word characters
                           (all but a-z, A-Z, 0-9, _), digits (0-9),
                           '_' (1 or more times (matching the most
                           amount possible))
---------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    \S                       non-whitespace (all but \n, \r, \t, \f,
                             and " ")
--------------------------------------------------------------------------------
  )                        end of look-ahead

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从Python的文本中提取A到L字母开头的单词? - How to extract words starting with letters from A to L from a text in Python? 将单词(仅限字母)和包含数字的单词提取到单独的 dataframe 列中 - Extract words (letters only) and words containing numbers into separate dataframe columns 如何从文本文件中将包含字母的行提取到数组中? - How to extract the lines containing the letters into an array from a text file? 如何使用python从文本中提取单词? - How to extract words from a text using python? 如何使用Python仅显示带有元音的单词中的字母 - How to use Python to show only the letters from words with vowels 使用Python删除文本文件中包含字符或字母字符串的单词 - Removing words in text files containing a character or string of letters with Python 如何将文本文件中的单词分隔为单个字母 python - How to separate words to single letters from text file python 如何在 Python 的 a.txt 文件中查找包含特定字母的单词? - how to find words containing specific letters in a .txt file with Python? Python 3.3:如何从每行的第一个单词中提取第一个字母? - Python 3.3: How to extract the first Letters from the first Words on each line? 如何删除python中只包含数字的单词? - How to remove words containing only numbers in python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM