简体   繁体   中英

How to find a word - First letter will be capital & other will be lower

Problem Statement: Filter those words from the complete set of text6, having first letter in upper case and all other letters in lower case. Store the result in variable title_words. print the number of words present in title_words.

I have tried every possible ways to find the answer but don't know where I am lagging.

import nltk
from nltk.book import text6
title_words = 0
for item in set(text6):
    if item[0].isupper() and item[1:].islower():
        title_words += 1
print(title_words)

I have tried in this way as well:

title_words = 0
for item in text6:
    if item[0].isupper() and item[1:].islower():
        title_words += 1
print(title_words)

I am not sure how many count its required, whatever the count is coming its not allowing me to pass the challenge. Please let me know if I am doing anything wrong in this code

One of the above suggestions did work for me. Sample code below.

title_words = [word for word in text6 if (len(word)==1 and word[0].isupper()) or (word[0].isupper() and word[1:].islower()) ]
print(len(title_words))

In the question, "Store the result in variable title_words. print the number of words present in title_words."

The result of filtering a list of elements is a list of the same type of elements. In your case, filtering the list text6 (assuming it's a list of strings) would result in a (smaller) list of strings. Your title_words variable should be this filtered list, not the number of strings; the number of strings would just be the length of the list.

It's also ambiguous from the question if capitalized words should be filtered out (ie. removed from the smaller list) or filtered (ie. kept in the list), so try out both to see if you're interpreting it incorrectly.

Give regular expressions a try:

>>> import re
>>> from nltk.book import text6
>>>
>>> text = ' '.join(set(text6))
>>> title_words = re.findall(r'([A-Z]{1}[a-z]+)', text)
>>> len(title_words)
461

text6 中有 50 个单例元素(长度为 1 的元素),但是,您的代码不会成功通过任何元素,例如“I”或“W”等。这是正确的,还是您需要最小长度为 2 的单词?

I think the problem is with set(text6) . I suggest you iterate over text6.tokens .

Update, explanation

The code you've provided is correct.

The issues is that the text can contain same words multiple times. Doing a set(words) will reduce the total available words, so you start with an incomplete data set.

The other responses are not necessary wrong in checking the validity of a word, but they are iterating over the same wrong data set.

Just few changes according to what the question asks.

from nltk.book import text6
title_words = []
for item in set(text6):
    if item[0].isupper() and item[1:].islower():
        title_words.append(item)
print(len(title_words))

Try this one:

title_words = [ word for word in text6 if word.istitle()]

print(len(title_words))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM