简体   繁体   English

获得介于m和n个字符之间的单词

[英]getting words between m and n characters

I am trying to get all names that start with a capital letter and ends with a full-stop on the same line where the number of characters are between 3 and 5 我正在尝试获取所有以大写字母开头和以句号结尾的相同名称,其中同一行上的字符数在3到5之间

My text is as follows: 我的文字如下:

 King. Great happinesse

 Rosse. That now Sweno, the Norwayes King,
Craues composition:
Nor would we deigne him buriall of his men,
Till he disbursed, at Saint Colmes ynch,
Ten thousand Dollars, to our generall vse

 King. No more that Thane of Cawdor shall deceiue
Our Bosome interest: Goe pronounce his present death,
And with his former Title greet Macbeth

 Rosse. Ile see it done

 King. What he hath lost, Noble Macbeth hath wonne.

I am testing it out on this link . 我正在此链接上进行测试。 I am trying to get all words between 3 and 5 but haven't succeeded. 我正在尝试使所有单词介于3到5之间,但没有成功。

Does this produce your desired output? 这会产生您想要的输出吗?

import re

re.findall(r'[A-Z].{2,4}\.', text)

When text contains the text in your question it will produce this output: text包含问题中的文本时,将产生以下输出:

['King.', 'Rosse.', 'King.', 'Rosse.', 'King.']

The regex pattern matches any sequence of characters following an initial capital letter. 正则表达式模式匹配首字母大写之后的任何字符序列。 You can tighten that up if required, eg using [az] in the pattern [AZ][az]{2,4}\\. 如果需要的话,例如,使用可以收紧,高达[az]在模式[AZ][az]{2,4}\\. would match an upper case character followed by between 2 to 4 lowercase characters followed by a literal dot/period. 会匹配一个大写字符,然后是2到4个小写字符,然后是文字点/句点。

If you don't want duplicates you can use a set to get rid of them: 如果您不希望重复,则可以使用一组来消除重复:

>>> set(re.findall(r'[A-Z].{2,4}\.', text))
set(['Rosse.', 'King.'])

You may have your own reasons for wanting to use regexs here, but Python provides a rich set of string methods and (IMO) it's easier to understand the code using these: 您可能有自己想在此处使用正则表达式的原因,但是Python提供了丰富的字符串方法集,(IMO)使用以下方法更容易理解代码:

matched_words = []
for line in open('text.txt'):
    words = line.split()
    for word in words:
        if word[0].isupper() and word[-1] == '.' and 3 <= len(word)-1 <=5:
            matched_words.append(word)
print matched_words

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何对文本文件进行排序以在 O(MN) 时间复杂度中找到字谜,其中 M 是最大字符数,N 是单词数? - How to sort a text file to find anagrams in O(MN) time complexity where M is the max number of characters and N is the number of words? 提取3个连续字符和单词的n-gram - Extracting n-grams of 3 contiguous characters and words (M,N)和(N,)数组之间的距离计算 - Distance computation between (M,N) and (N,) arrays Python Numpy初学者:获取形状为((M,N),(M,N))的数组 - Python Numpy beginner: getting an array with a shape ((M,N),(M,N)) python regex-在多行字符串中的两个字符之间获取所有内容(\\ n除外) - python regex- getting everything (except \n) between two characters in a multiline string 在m个字符的列表中找到n个不同的字符序列的每次出现 - Find each occurrence of n different sequence of characters in a list of m characters 计算Python标点符号之间的单词数 - Counting number of words between punctuation characters in Python Python在特殊字符和单词之间添加空格 - Python add a space between special characters and words Python - 在单词之后拆分句子,但结果中最多包含n个字符 - Python - split sentence after words but with maximum of n characters in result 从可以是 A、B 或 C 的 n 个字符中生成所有单词的函数 - Function that generates all words out of n characters that can be A,B or C
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM