简体   繁体   English

对于列表中的多个字符串,如何查找字符串中以大写字母开头的所有单词

[英]How to find all words in a string that begin with an uppercase letter, for multiple strings in a list

I have a list of strings, each string is about 10 sentences.我有一个字符串列表,每个字符串大约 10 个句子。 I am hoping to find all words from each string that begin with a capital letter.我希望从每个字符串中找到所有以大写字母开头的单词。 Preferably after the first word in the sentence.最好在句子的第一个词之后。 I am using re.findall to do this.我正在使用 re.findall 来执行此操作。 When I manually set the string = '' I have no trouble do this, however when I try to use a for loop to loop over each entry in my list I get a different output.当我手动设置 string = '' 时,这样做没有问题,但是当我尝试使用 for 循环遍历列表中的每个条目时,我得到一个不同的 output。

for i in list_3:
    string = i
    test = re.findall(r"(\b[A-Z][a-z]*\b)", string)
print(test)

output: output:

['I', 'I', 'As', 'I', 'University', 'Illinois', 'It', 'To', 'It', 'I', 'One', 'Manu', 'I', 'I', 'Once', 'And', 'Through', 'I', 'I', 'Most', 'Its', 'The', 'I', 'That', 'I', 'I', 'I', 'I', 'I', 'I']

When I manually input the string value当我手动输入字符串值时

txt = 0
for i in list_3:
    string = list_3[txt]
    test = re.findall(r"(\b[A-Z][a-z]*\b)", string)
print(test)

output: output:

['Remember', 'The', 'Common', 'App', 'Do', 'Your', 'Often', 'We', 'Monica', 'Lannom', 'Co', 'Founder', 'Campus', 'Ventures', 'One', 'Break', 'Campus', 'Ventures', 'Universities', 'Undermatching', 'Stanford', 'Yale', 'Undermatching', 'What', 'A', 'Yale', 'Lannom', 'There', 'During', 'Some', 'The', 'Lannom', 'That', 'It', 'Lannom', 'Institutions', 'University', 'Chicago', 'Boston', 'College', 'These', 'Students', 'If', 'Lannom', 'Recruiting', 'Elite', 'Campus', 'Ventures', 'Understanding', 'Campus', 'Ventures', 'The', 'For', 'Lannom', 'What', 'I', 'Wish', 'I', 'Knew', 'Before', 'Starting', 'Company', 'I', 'Even', 'I', 'Lannom', 'The', 'There']

But I can't seem to write a for loop that correctly prints the output for each of the 5 items in the list.但我似乎无法为列表中的 5 项中的每一项编写一个正确打印 output 的 for 循环。 Any ideas?有任何想法吗?

The easiest way yo do that is to write a for loop which checks whether the first letter of an element of the list is capitalized.最简单的方法是编写一个for循环来检查列表元素的第一个字母是否大写。 If it is, it will be appended to the output list.如果是,它将被附加到output列表中。

output = []
for i in list_3:
    if i[0] == i[0].upper():
        output.append(i)
print(output)

We can also use the list comprehension and made that in 1 line.我们也可以使用列表推导并在 1 行中完成。 We are also checking whether the first letter of an element is the capitalized letter.我们也在检查元素的第一个字母是否是大写字母。

output = [x for x in list_3 if x[0].upper() == x[0]]
print(output)

EDIT编辑

You want to place the sentence as an element of a list so here is the solution.您想将句子作为列表的元素放置,因此这是解决方案。 We iterate over the list_3 , then iterate for every word by using the split() function.我们遍历list_3 ,然后使用split() function 遍历每个单词。 We are thenchecking whether the word is capitalized.然后我们检查单词是否大写。 If it is, it is added to an output .如果是,则将其添加到output中。

list_3 = ["Remember your college application process? The tedious Common App applications, hours upon hours of research, ACT/SAT, FAFSA, visiting schools, etc. Do you remember who helped you through this process? Your family and guidance counselors perhaps, maybe your peers or you may have received little to no help"]
output = []
for i in list_3:
    for j in i.split():
        if j[0].isupper():
            output.append(j)
print(output)

Assuming sentences are separated by one space, you could use re.findall with the following regular expression.假设句子由一个空格分隔,您可以将re.findall与以下正则表达式一起使用。

r'(?m)(?<!^)(?<![.?!] )[A-Z][A-Za-z]*'

Start your engine!启动你的引擎! | | Python code Python代码

Python's regex engine performs the following operations. Python 的正则表达式引擎执行以下操作。

(?m)         : set multiline mode so that ^ and $ match the beginning
               and the end of a line
(?<!^)       : negative lookbehind asserts current location is not
               at the beginning of a line
(?<![.?!] )  : negative lookbehind asserts current location is not
               preceded by '.', '?' or '!', followed by a space
[A-Z]        : match an uppercase letter
[A-Za-z]*    : match 1+ letters

If sentences can be separated by one or two spaces, insert the negative lookbehind (?<.[??!] ) after (?<.[??!] ) .如果句子可以用一两个空格分隔,则在(?<.[??!] ) (?<.[??!] )

If the PyPI regex module were used, one could use the variable-length lookbehind (?<.[??!] +)如果使用 PyPI 正则表达式模块,可以使用可变长度的lookbehind (?<.[??!] +)

As i understand, you have list like this:据我了解,您有这样的列表:

list_3 = [
  'First sentence. Another Sentence',
  'And yet one another. Sentence',
]

You are iterating over the list but every iteration overrides test variable, thus you have incorrect result.您正在迭代列表,但每次迭代都会覆盖test变量,因此您的结果不正确。 You eihter have to accumulate result inside additional variable or print it right away, every iteration:您要么必须在附加变量中累积结果,要么在每次迭代时立即打印它:

acc = []
for item in list_3:
  acc.extend(re.findall(regexp, item))
print(acc)

or或者

for item in list_3:
  print(re.findall(regexp, item))

As for regexp, that ignores first word in the sentence, you can use至于正则表达式,它会忽略句子中的第一个单词,您可以使用

re.findall(r'(?<!\A)(?<!\.)\s+[A-Z]\w+', s) 
  • (?<!\A) - not the beginning of the string (?<!\A) - 不是字符串的开头
  • (?<.\.) - not the first word after dot (?<.\.) - 不是点后的第一个单词
  • \s+ - optional spaces after dot. \s+ - 点后的可选空格。

You'll receive words potentialy prefixed by space, so here's final example:您将收到可能以空格为前缀的单词,因此这是最后一个示例:

acc = []
for item in list_3:
  words = [w.strip() for w in re.findall(r'(?<!\A)(?<!\.)\s+[A-Z]\w+', item)]
  acc.extend(words)
print(acc)

as I really like regexes, try this one:因为我真的很喜欢正则表达式,所以试试这个:

#!/bin/python3
import re

PATTERN = re.compile(r'[A-Z][A-Za-z0-9]*')

all_sentences = [
    "My House! is small",
    "Does Annie like Cats???"
]

def flat_list(sentences):
    for sentence in sentences:
        yield from PATTERN.findall(sentence)

upper_words = list(flat_list(all_sentences))
print(upper_words)

# Result: ['My', 'House', 'Does', 'Annie', 'Cats']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何检查字符串列表中的字符串中是否存在大写字母? - How do I check if there is a uppercase letter in a string within a list of strings? 查找列表中仅相差一个字母的所有单词 - Find all the words in the list that differ by a single letter 如何找出列表中以元音开头的单词? - How to find out the words begin with vowels in a list? 在字符串上找到大写字母并替换它 - find the uppercase letter on a string and replace it 如何将一个包含多个单词的字符串拆分成一个包含一定数量单词的字符串的列表? - How to split a string of multiple words into a list with strings of a certain number of words? 查找列表/文件中以特定前缀/后缀开头/结尾的所有单词 - find all words in list/file that begin/ends with a specific prefix/suffix 如何在单词列表中找到小写的第一个字母,并将其更改为大写 - How to find lowercase first letter in word list, and change them into uppercase 如何在单词列表中找到字母的位置 - How to find the position of a letter in a list of words 如何使用python查找字符串中的第一个非大写字母 - How to find first non-uppercase letter in the string using python 如何将用户输入与列表中的大写字母字符串进行比较? - How do I compare user input to an uppercase letter string in a list?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM