[英]getting words between m and n characters
I am trying to get all names that start with a capital letter and ends with a full-stop on the same line where the number of characters are between 3 and 5 我正在尝试获取所有以大写字母开头和以句号结尾的相同名称,其中同一行上的字符数在3到5之间
My text is as follows: 我的文字如下:
King. Great happinesse
Rosse. That now Sweno, the Norwayes King,
Craues composition:
Nor would we deigne him buriall of his men,
Till he disbursed, at Saint Colmes ynch,
Ten thousand Dollars, to our generall vse
King. No more that Thane of Cawdor shall deceiue
Our Bosome interest: Goe pronounce his present death,
And with his former Title greet Macbeth
Rosse. Ile see it done
King. What he hath lost, Noble Macbeth hath wonne.
I am testing it out on this link . 我正在此链接上进行测试。 I am trying to get all words between 3 and 5 but haven't succeeded. 我正在尝试使所有单词介于3到5之间,但没有成功。
Does this produce your desired output? 这会产生您想要的输出吗?
import re
re.findall(r'[A-Z].{2,4}\.', text)
When text
contains the text in your question it will produce this output: 当text
包含问题中的文本时,将产生以下输出:
['King.', 'Rosse.', 'King.', 'Rosse.', 'King.']
The regex pattern matches any sequence of characters following an initial capital letter. 正则表达式模式匹配首字母大写之后的任何字符序列。 You can tighten that up if required, eg using [az]
in the pattern [AZ][az]{2,4}\\.
如果需要的话,例如,使用可以收紧,高达[az]
在模式[AZ][az]{2,4}\\.
would match an upper case character followed by between 2 to 4 lowercase characters followed by a literal dot/period. 会匹配一个大写字符,然后是2到4个小写字符,然后是文字点/句点。
If you don't want duplicates you can use a set to get rid of them: 如果您不希望重复,则可以使用一组来消除重复:
>>> set(re.findall(r'[A-Z].{2,4}\.', text))
set(['Rosse.', 'King.'])
You may have your own reasons for wanting to use regexs here, but Python provides a rich set of string methods and (IMO) it's easier to understand the code using these: 您可能有自己想在此处使用正则表达式的原因,但是Python提供了丰富的字符串方法集,(IMO)使用以下方法更容易理解代码:
matched_words = []
for line in open('text.txt'):
words = line.split()
for word in words:
if word[0].isupper() and word[-1] == '.' and 3 <= len(word)-1 <=5:
matched_words.append(word)
print matched_words
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.