[英]Capitalize first letter in each sentence in Python
I am looking to capitalize the first letter in each sentence, however there are some HTML tags embedded in the string like below.我希望将每个句子中的第一个字母大写,但是在字符串中嵌入了一些 HTML 标签,如下所示。
this is my dog. <font color="red">he</font> is very nice. he likes to <b>play</b>. <b>he</b> likes to growl.
How can I ensure that every first letter of each sentence, excluding the HTML tags, are capitalized but still keep the tags?如何确保每个句子的每个首字母(不包括 HTML 标签)都大写但仍保留标签? Desired output:
所需的 output:
This is my dog. <font color="red">He</font> is very nice. He likes to <b>play</b>. <b>He</b> likes to growl.
Any help would be appreciated.任何帮助,将不胜感激。
Regular expression aren't my strongest but I believe something like this would work.正则表达式不是我最强的,但我相信这样的事情会奏效。
import re
def capitalize(input):
result = []
sentences = re.findall('([\w<][^\.!?]*[\.!?])',input)
for sentence in sentences:
if sentence[0] != '<':
sentence = sentence.capitalize()
else:
tag_text = re.findall('>(.*?)<', sentence)
first_tag = '>' + tag_text[0].capitalize() + '<'
sentence = re.sub('>(.*?)<', first_tag, sentence)
result.append(sentence)
return ' '.join(result)
I believe the that substitution regex could probably be simplified, to not have to insert the '>' & '<' back in.我相信替换正则表达式可能会被简化,不必重新插入“>”和“<”。
This is what I came up with...这就是我想出的...
text = "this is my dog. <font color=\"red\">he</font> is very nice. he likes to <b>play</b>. <b>he</b> likes to growl."
newString = []
upcaseNextLetter = True
skip = False
for letter in text:
if(not skip):
skip = letter=="<"
if(not skip):
letter = letter.upper() if upcaseNextLetter else letter
upcaseNextLetter = (letter==" " and upcaseNextLetter) or letter=="."
else:
skip = not letter==">"
newString.append(letter)
text = ''.join(newString)
I just skip whenever we're between brackets.每当我们在括号之间时,我都会跳过。 And I uppercase if we saw a dot or a space following a dot.
如果我们看到一个点或一个点后面的空格,我会大写。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.