简体   繁体   中英

Capitalizing the beginning of sentences in Python

The following code is for an assignment that asks that a string of sentences is entered from a user and that the beginning of each sentence is capitalized by a function. For example, if a user enters: 'hello. these are sample sentences. there are three of them.' The output should be: 'Hello. These are sample sentences. There are three of them.'

I have created the following code:

def main():
    sentences = input('Enter sentences with lowercase letters: ')
    capitalize(sentences)

#This function capitalizes the first letter of each sentence
def capitalize(user_sentences):
    sent_list = user_sentences.split('. ')
    new_sentences = []
    count = 0

    for count in range(len(sent_list)):
        new_sentences = sent_list[count]
        new_sentences = (new_sentences +'. ')
        print(new_sentences.capitalize())

main()

This code has two issues that I am not sure how to correct. First, it prints each sentence as a new line. Second, it adds an extra period at the end. The output from this code using the sample input from above would be:

Hello.
These are sample sentences.
There are three of them..

Is there a way to format the output to be one line and remove the final period?

The following works for reasonably clean input:

>>> s = 'hello. these are sample sentences. there are three of them.'
>>> '. '.join(x.capitalize() for x in s.split('. '))
'Hello. These are sample sentences. There are three of them.'

If there is more varied whitespace around the full-stop, you might have to use some more sophisticated logic:

>>> '. '.join(x.strip().capitalize() for x in s.split('.'))

Which normalizes the whitespace which may or may not be what you want.

def main():
    sentences = input('Enter sentences with lowercase letters: ')
    capitalizeFunc(sentences)

def capitalizeFunc(user_sentences):
    sent_list = user_sentences.split('. ')
    print(".".join((i.capitalize() for i in sent_list)))

main()

Output:

Enter sentences with lowercase letters: "hello. these are sample sentences. there are three of them."
Hello.These are sample sentences.There are three of them.

I think this might be helpful:

>>> sentence = input()    
>>> '. '.join(map(lambda s: s.strip().capitalize(), sentence.split('.')))

This code has two issues that I am not sure how to correct. First, it prints each sentence as a new line.

That's because you're printing each sentence with a separate call to print . By default, print adds a newline. If you don't want it to, you can override what it adds with the end keyword parameter. If you don't want it to add anything at all, just use end=''

Second, it adds an extra period at the end.

That's because you're explicitly adding a period to every sentence, including the last one.

One way to fix this is to keep track of the index as well as the sentence as you're looping over them—eg, with for index, sentence in enumerate(sentences): . Then you only add the period if index isn't the last one. Or, slightly more simply, you add the period at the start , if the index is anything but zero.

However, theres a better way out of both of these problems. You split the string into sentences by splitting on '. ' '. ' . You can join those sentences back into one big string by doing the exact opposite:

sentences = '. '.join(sentences)

Then you don't need a loop (there's one hidden inside join of course), you don't need to worry about treating the last or first one special, and you only have one print instead of a bunch of them so you don't need to worry about end .

A different trick is to put the cleverness of print to work for you instead of fighting it. Not only does it add a newline at the end by default, it also lets you print multiple things and adds a space between them by default. For example, print(1, 2, 3) or, equivalently, print(*[1, 2, 3]) will print out 1 2 3 . And you can override that space separator with anything else you want. So you can print(*sentences, sep='. ', end='') to get exactly what you want in one go. However, this may be a bit opaque or over-clever to people reading your code. Personally, whenever I can use join instead (which is usually), I do that even though it's a bit more typing, because it makes it more obvious what's happening.


As a side note, a bit of your code is misleading:

  new_sentences = []
  count = 0

 for count in range(len(sent_list)):
     new_sentences = sent_list[count]
     new_sentences = (new_sentences +'. ')
     print(new_sentences.capitalize())

The logic of that loop is fine, but it would be a lot easier to understand if you called the one-new-sentence variable new_sentence instead of new_sentences , and didn't set it to an empty list at the start. As it is, the reader is led to expect that you're going to build up a list of new sentences and then do something with it, but actually you just throw that list away at the start and handle each sentence one by one.

And, while we're at it, you don't need count here; just loop over sent_list directly:

for sentence in sent_list:
    new_sentence = sent + '. '
    print(new_sentence.capitalize())

This does the same thing as the code you had, but I think it's easier to understand that it does that think from a quick glance.

(Of course you still need the fixes for your two problems.)

Use nltk.sent_tokenize to tokenize the string into sentences. And capitalize each sentence, and join them again.

A sentence can't always end with a . , there can other things too, like a ? , or ! . Also three consecutive dots ... , doesn't end the sentence. sent_tokenize will handle them all.

from nltk.tokenize import sent_tokenize

def capitalize(user_sentences):
    sents = sent_tokenize(user_sentences)
    capitalized_sents = [sent.capitalize() for sent in sents]
    joined_ = ' '.join(capitalized_sents)
    print(joined_)

The reason your sentences were being printed on separate lines, were because print always ends its output with a newline. So, printing sentences separately in loop will make them print on newlines. So, you should print them all at once, after joining them. Or, you can specify end='' in print statement, so it doesn't end the sentences with newline characters.

The second thing, about output being ended with an extra period, is because, you're appending '. ' '. ' with each of the sentence. The good thing about sent_tokenize is, it doesn't remove '.', '?', etc from the end of the sentences, so you don't have to append '. ' '. ' at the end manually again. Instead, you can just join the sentences with a space character, and you'll be good to go.

If you get an error for nltk not being recognized, you can install it by running pip install nltk on the terminal/cmd.

>>> s = 'hello. these are sample sentences. there are three of them.'
>>> '. '.join(map(str.capitalize, s.split('. ')))
'Hello. These are sample sentences. There are three of them.'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM