简体   繁体   English

Python中每个句子的首字母大写

[英]Capitalize first letter in each sentence in Python

I am looking to capitalize the first letter in each sentence, however there are some HTML tags embedded in the string like below.我希望将每个句子中的第一个字母大写,但是在字符串中嵌入了一些 HTML 标签,如下所示。

this is my dog. <font color="red">he</font> is very nice. he likes to <b>play</b>. <b>he</b> likes to growl.

How can I ensure that every first letter of each sentence, excluding the HTML tags, are capitalized but still keep the tags?如何确保每个句子的每个首字母(不包括 HTML 标签)都大写但仍保留标签? Desired output:所需的 output:

This is my dog. <font color="red">He</font> is very nice. He likes to <b>play</b>. <b>He</b> likes to growl.

Any help would be appreciated.任何帮助,将不胜感激。

Regular expression aren't my strongest but I believe something like this would work.正则表达式不是我最强的,但我相信这样的事情会奏效。

import re

def capitalize(input):
    result = []
    
    sentences = re.findall('([\w<][^\.!?]*[\.!?])',input)
    for sentence in sentences:
        if sentence[0] != '<':
            sentence = sentence.capitalize()
        else:
            tag_text = re.findall('>(.*?)<', sentence)
            first_tag = '>' + tag_text[0].capitalize() + '<'
            sentence = re.sub('>(.*?)<', first_tag, sentence)

        result.append(sentence)

    return ' '.join(result)

I believe the that substitution regex could probably be simplified, to not have to insert the '>' & '<' back in.我相信替换正则表达式可能会被简化,不必重新插入“>”和“<”。

This is what I came up with...这就是我想出的...

text = "this is my dog. <font color=\"red\">he</font> is very nice. he likes to <b>play</b>. <b>he</b> likes to growl."
newString = []
upcaseNextLetter = True
skip = False

for letter in text:
    if(not skip):
        skip = letter=="<"
        if(not skip):
            letter = letter.upper() if upcaseNextLetter else letter
            upcaseNextLetter = (letter==" " and upcaseNextLetter) or letter=="."
    else:
        skip = not letter==">"

    newString.append(letter)
    
text = ''.join(newString)

I just skip whenever we're between brackets.每当我们在括号之间时,我都会跳过。 And I uppercase if we saw a dot or a space following a dot.如果我们看到一个点或一个点后面的空格,我会大写。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM